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Genome research has indeed pampered our optimism as it was construed that the 
same would enhance our understanding on the mechanisms that lead to genetic 
diseases. In the context of human health, genetics involves study on single gene and 
their regulation to improve public health and prevent diseases. Genetic research 
helps to identify diseases and health problems that are more likely to be influenced 
by genetic factors. Genetic tests enable the risk assessment and determine the pre- 
disposition of an individual to various diseases by uncovering the mutations or 
variations in the genome. Such information may be useful in managing an individ- 
ual’s lifestyle and healthcare system. In addition to testing for particular conditions, 
genetic research provides solutions to health problems caused by genetic abnor- 
malities and mutations either by medications or genetic modification. Most genetic 
disorders cannot be cured; however, many people have restored their health and 
avoided potentially life-threatening diseases with the help of genetic research by 
taking due precautions coupled with advanced medicaments and changed lifestyle. 

Continuous technological improvements in DNA sequencing have created an 
ambiance par excellence that a large number of disease-causing microbe and viral 
genomes are sequenced on regular basis. The availability and the integration of 
genetical information have been the driving forces toward our understanding of the 
normal and abnormal genomes. 

We believe that newer and far more despicable diseases would continue to 
emerge so also the quest to fight these diseases. Conceptually, advances in genetical 
knowledge fueled by technology could be used to prevent diseases creating much 
healthier gene pool. Thus, genome analysis both for normal and diseased ones 
would continue to upgrade our knowledge ensuring hope and assuring a healthy 
world. 
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A human genome contains approximately 3.3 billion DNA bases. By comparing an 
individual’s sequence to a human genome reference sequence, DNA changes are 
detected at almost every DNA base position. Depending on the location, changes in 
the DNA bases may or may not alter the gene functions. Even if it does not affect a 
gene, the changes may still affect the genetic structure of an individual. The data 
gained from the genome sequence is subjected to various bioinformatics annotation 
tools and analyzed so as to decipher the core reasons behind these DNA changes or 
variations that may have an impact on individuals’ health. Some of the DNA changes 
identified in the sequence are linked to genetic disorders that can be inherited within 
a family. These variations can affect the molecular pathways of the cell, leading to 
alterations in the physical trait, or can be linked to risk for common diseases. 

With the advent of new high-throughput technologies, the conventional focus on 
genetics and single genes is drifting toward the study of the whole genome includ- 
ing the exome sequencing, the study of complex genes, gene—gene interactions, and 
the association between genes and environment (epigenetics). This evolution in 
genomics, genetics, and other related molecular biology technologies has created 
substantial avenues for the advanced understanding, prevention, treatment, and cure 
of human diseases. 

This book is intended to provide basic information on genome analysis and its 
impact on human health. It focuses on different approaches that have been adopted 
to address one or the other issues related to human health including cancer. 
Additionally, it covers the domain that still must be explored in order to understand 
the signaling processes in the genome and gene—gene interactions encompassing a 
large number of still undefined and poorly understood interactomes that affect 
human health. 
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Genetic and Epigenetic Regulation 
of Autophagy in Cancer 


Anup S. Pathania, Ubaid S. Makhdoomi, 
and Fayaz A. Malik 


1.1 Introduction 


Cancer is a class of disease characterized by cells’ abnormal growth and division. 
Cancer cells grow very fast in an uncontrolled manner as compared to normal 
cells and form lumps or tissue mass called tumor (except leukemia). Solid tumors 
are benign in nature as long as they are localized to their tissue of origin and 
become malignant when cells migrate to distant vital tissues of the body like, 
brain, bone, liver, lung, etc. through blood or lymphatic system. The transforma- 
tion of normal cells into cancerous cells is a multistep process caused by muta- 
tions. The process of accumulating mutations normally takes many years, and 
several mutations are needed for a normal cell to acquire such oncogenic behav- 
ior. In all cancers, these mutations are mainly found in tumor suppressor and 
proto-oncogenes. Mutations in tumor suppressor genes render them with loss of 
functionality and inactivate their inhibitory properties on cell growth and division. 
Such mutations are also known as loss of function mutations and they are com- 
mon in cancer cells. Some common examples of tumor suppressor genes bearing 
such mutations in cancer cells are retinoblastoma gene (RB), p53, BRCA (breast 
cancer genes), APC (adenomatous polyposis coli), PTEN (phosphatase and tensin 
homologue), p27, etc. (Lee and Muller 2010). In normal cells, regulated counter- 
parts of oncogenes are known as proto-oncogenes that control cell division and 
proliferation. Mutations in proto-oncogenes deregulate their activities (also 
known as gain in function) leading their conversion into the cancer-forming 
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oncogenes like those of RAS, Myc, HER2, Cyclin D, Bcl-2, etc. (Lee and Muller 
2010). Many such genetic changes that happened are like point mutations, inser- 
tions, deletions, gene amplifications, or chromosomal translocation of proto- 
oncogenes to another normal gene that dysregulate its expression. The factors 
behind these genetic changes are random and are not cell specific and are different 
in different types of cancers, e.g., lung cancer, 90% of cases are associated with 
cigarette smoking and risk of cancer increase with tobacco dose and is the reason 
for the mutations in lung cells (Sasco et al. 2004). Similarly, the use of alcohol, 
tobacco, and human papillomavirus or Epstein-Barr virus infection are important 
risk factors for head and neck carcinomas (Goldenberg et al. 2004; Leemans et al. 
2011). The presence of carcinogens like heterocyclic amines (HCAs), N-nitroso 
compounds (NOCs), and heme in red meat damages the DNA of cells that line the 
digestive system (Alexander and Cushing 2011). However, it is still to be under- 
stood the reasons behind the acquisition of mutations in key genes of normal cells 
which become nonresponsive to cellular homeostasis. A nonsmoker can develop 
a lung cancer; women who take normal food and exercise regularly with no 
genetic history of breast cancer can have this type of tumor. As our understanding 
of cancer is continuously growing, it has been established that besides environ- 
mental factors, genetic predisposition also plays a major role in cancer develop- 
ment. Genetic predisposition means increase in the likelihood of developing a 
particular disease based on the genetic makeup of person he or she acquired from 
his parents or ancestors. Mammalian cells have two copies of genes, and as long 
as cells contain at least one functional copy, the gene remains fully functional. 
The examples of such genetic mutations in mammalian cells include the retino- 
blastoma gene (RB), p53, BRCA1 and BRCA2 in breast cancer, TERT in mela- 
noma, APC in colon cancer, etc. However, it is to mention that more than 75% of 
cancers are sporadic, which means they occur by chance and have no familial 
history. Apart from genetic defects, influence of epigenetic changes is also equally 
responsible for cancer development and progression. Though epigenetic altera- 
tions do not include any changes in cellular DNA sequences, they are mitotically 
and meiotically inheritable. Epigenetic changes can switch on or off the genes by 
controlling their transcription. The common epigenetic events occur in cells 
include methylation of cytosine bases of DNA present within CpG dinucleotides 
that are found in 5’-end regulatory regions of many genes and almost in all house- 
keeping genes. Similarly, acetylation or deacetylation occurs on the lysine resi- 
dues present within N-terminal tail of histone core of the nucleosome as a part of 
gene regulation. Small noncoding regulatory RNAs (siRNAs) also play a critical 
role in tumorigenesis. In this chapter, we will try to discuss the role of genetic and 
epigenetic changes associated with the process of autophagy and its implications 
in cancer. Autophagy is a catabolic process that uses cell lysosomal machinery to 
degrade unnecessary or dysfunctional cellular constituents. This process not only 
eliminates the damaged organelles but also provides raw materials to the cells 
under stress-related conditions to maintain homeostasis. Autophagy has been 
shown to play an important role in cancer progression (Choi 2012), angiogenesis 
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(Du et al. 2012), epithelial to mesenchymal transition (Li et al. 2013), metastasis 
(Peng et al. 2013), resistance (Peng et al. 2013), inflammation (Levine et al. 2011), 
infection (Deretic 2010), and neurological disorders (Nixon and Yang 2012). 
However, prolonged autophagy activation may also lead to cell death due to 
excessive catabolism, known as type II programmed cell death or autophagic cell 
death. 


1.2. The Process of Autophagy 


Autophagy starts with the formation of membranous structures or phagophore 
around the cytoplasmic sites known as pre-autophagosomal structure (PAS) dis- 
covered in yeast (Klionsky 2007). Phagophore membrane formation is mainly 
contributed by ER, Golgi, and endosomes under the control of various signaling 
events. Sequential activation of specific autophagy-related genes (ATGs) is 
involved in the formation of phagophore. ATGs were first identified in yeast 
through genetic screening, and many of their homologues have been subsequently 
found and characterized in higher eukaryotes. Autophagic process mainly 
involved five events, induction, vesicle nucleation, vesicle elongation, autophago- 
somes formation, and autophagosome-lysosome fusion. The main regulator of 
autophagy induction in cells is the mammalian target of rapamycin (mTOR) (Jung 
et al. 2009), a nutrient sensor of the cell that inhibits autophagy under nutrient- 
rich conditions. mTOR phosphorylates autophagy protein ATG13L at multiple 
serine residues and mammalian homologues of ATG1 protein ULK-1 or ULK-2 
(UNC-51-like kinases). mTOR-mediated phosphorylation of ULK-1 and ULK-2 
inhibits their activity, rendering them unable to phosphorylate and activate focal 
adhesion kinase family-interacting protein of 200 kD (FIP200). ULK-1, ULK-2, 
and ATG13 form a complex with FIP200 known as ATG13L-ULK-1/2-FIP200 
complex which recruits other proteins for autophagosome formation. During 
nutrient starvation, cellular ATP levels decrease in cells; as a consequence of 
which, cAMP levels increase which further activates energy sensor protein, 
AMPK (AMP-activated protein kinase). AMPK inhibits mTOR activity and pro- 
motes hypophosphorylation of ATG13L which favors the formation of active 
ATG13L-ULK-1/2-FIP200 complex and hence the induction of autophagy (Jung 
et al. 2009). Vesicle nucleation starts with the recruitment of ATGs to the growing 
phagophore. Although the process is less understood, type II PI3 kinases play an 
important role in the recruitment of ATGs to the phagophore. Type III PI3 kinases 
consist of single catalytic subunit VPS34 (homologue of yeast vacuolar protein 
sorting-associated protein 34) and produce PtdIns3P (phosphatidylinositol 
3-phosphate). VPS34 interacts with autophagic proteins Beclin and VPS1I5 and 
forms type III PI3 kinase complex, also known as Beclin-VPS34-VPSI5 com- 
plex. This complex is recruited to PAS by ATG14 where it acts as a localization 
site for most ATG proteins that facilitate autophagosome formation (Mizushima 
et al. 2011). The third step, vesicle elongation, involves the expansion of 
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phagophore membranes around PAS for the formation of autophagosomes. Two 
ubiquitin-like conjugation systems, ATGS-ATG12 and ATG8 are involved in this 
process (Nakatogawa 2013). Eloborating these systems is beyond the scope of 
this chapter. However, these interactions lead to the lipidation of ATG8 (LC3B in 
mammals), a typical marker of autophagy, with phosphatidylethanolamine (PE) 
that insets ATG8 into autophagosome membranes. Once the autophagosomes are 
formed, they fuse with lysosomes to form autolysosomes in which the autophagic 
substrates are degraded and recycled (Fig. 1.1). 
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Fig. 1.1 The autophagic process. Formation of autophagosomes involves the various steps. The 
process begins with the formation of double membrane structures (phagophores) around the cargo 
molecules (damaged organelles and macromolecules, misfolded proteins, pathogens, etc.) in the 
cytoplasm. The process is induced by formation of ATG13-ULK-1/2-FIP200 complex at the 
autophagosome assembly site in mammalian cells. mTOR promotes the phosphorylation and inac- 
tivation of ATG13 and ULK-1/2 under nutrient-rich conditions. Inhibition of mTOR triggers the 
formation of this complex and induces autophagy in cells. Next step is vesicle nucleation which is 
performed by type III PI3 kinases along with Beclin and VPS15. Type III PI3 kinase subunit, 
VPS34, interacts with autophagic proteins Beclin and VPS15 and forms Beclin- VPS34-VPS15 
complex which is recruited to the phagophore by ATG14. Beclin- VPS34-VPS15 complex acts as 
a nucleation site for most of ATGs involved in autophagosome formation. Vesicle elongation 
involves two ubiquitin-like conjugation systems, ATG5-ATG12 conjugation systems and ATG8 
(LC3B in mammals) conjugation systems, which promote the lipidation of LC3B with phosphati- 
dyl ethanolamine and complete the formation of autophagosomes. Once autophagosomes are 
formed, it fused with lysosomes to form autolysosomes which degrade its inner constituents via 
lysosomal hydrolases 
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1.3. Autophagy in Cancer 


Autophagy dysregulation is a common phenomenon in almost all cancers, and modu- 
lating this process is an area of great interest in cancer drug discovery. The expression 
of many autophagy-associated genes is altered in cancer cells that leads to tumor pro- 
gression. The first direct link between autophagy and cancer was established in early 
1999 by Aita et al., demonstrating that the mono-allelic deletions of beclin (an ortho- 
log of yeast ATG6 present on chromosome 17q21) are found in human breast and 
ovarian carcinoma cell lines (Aita et al. 1999). The same year Liang and co-workers 
reported that beclin promotes autophagy in autophagy-defective yeast having targeted 
the disruption of agp6/vps30 and in breast cancer cell line MCF-7 (Liang et al. 1999). 
Beclin-induced autophagy inhibits cellular proliferation, clonogenic survival, and 
tumorigenesis in mice (Liang et al. 1999). Further work by Qu et al. (2003) demon- 
strated that heterozygous disruption of beclin increases cellular proliferation, sponta- 
neous tumorigenesis, and development of HBV-induced premalignant lesions in 
mouse tumor models. Southern blots and mutational analysis of genomic DNA of 
beclin (+/—) mice did not reveal any deletions or rearrangements in the remaining 
normal beclin allele, and hence inactivation of only one allele is sufficient to promote 
tumorigenesis. These results show that Beclin is a haploinsufficient tumor suppressor 
gene and it does not follow Knudson two-hit hypothesis where mutations in both 
alleles are required for tumor suppressor gene to lose its function (Qu et al. 2003). 
These findings also revealed a new role of autophagy in preventing dysregulated 
growth of tumor cells besides maintaining homeostasis. Beclin is a Bcl-2 homology 
(BH)-3 domain-only protein localized throughout cytoplasm including mitochondria, 
ER, and nucleus (Kang et al. 2011). Beclin gene maps to a region of 150 kb centro- 
meric to BRCAI gene present on chromosome 17q21 and encodes a 2098-bp tran- 
script, with a 120-bp 5’ UTR, 1353-bp coding region, and 625-bp 3’ UTR (Aita et al. 
1999). Beclin contains three domains: N-terminal short BH3 domain (105-125 resi- 
dues), a central coiled-coil segment (1 14-269), and C-terminal evolutionary conserved 
domain (244-337) (Sinha and Levine 2008; Huang et al. 2012). The Bcl-2 member 
proteins Bcl-2 and Bcl-xL regulate autophagy via binding to BH3 domain of Beclin 
and inhibit its association with class II PI3 kinases, an important autophagy regulator 
in cells (Pattingre et al. 2005; Ku et al. 2008). Furthermore, phosphorylation of BH3 
domain at threonine 308 residue by proapoptotic kinase Mst1 stabilizes Beclin-Bcl-2 
interactions and inhibits the formation of ATG14L-Beclin1-Vps34 autophagic com- 
plex (Maejima et al. 2013). Another proapoptotic BH3 only protein BCL2L11 (also 
known as BIM) inhibits autophagy in cells by interacting with Beclin and facilitates its 
binding with dynein protein DYNLL1. Starvation induces BIM phosphorylation 
through MAPK8/JNK pathway and abolishes BIM-DYNLL1 interactions, allowing 
dissociation of Beclin from BIM, and induces autophagy (Luo and Rubinsztein 2013). 
The central coiled-coil domain of Beclin is required for its interactions with autophagy 
proteins ATG14 and ultraviolet radiation resistance-associated gene (UVRAG) which 
forms Beclin-ATG14 or Beclin-UVRAG heterodimers during autophagy (Li et al. 
2012). The third evolutionary conserved domain of Beclin is required for its binding 
with Vps34 and lipid membranes (Furuya et al. 2005; Huang et al. 2012). Mutations 
in this region hinder Beclin binding with membranes and compromise autophagy; 
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however, it has no effects on Beclin interactions with other autophagy mediators like 
UVRAG and ATG14 (Huang et al. 2012; Fu et al. 2013). Mutations in Beclin gene are 
found in many cancer types and have been associated with the poor prognosis. In 
prostate carcinoma tumor, low oxygen and androgen deprivation trigger AMPK acti- 
vation which induces autophagy via beclin activation. The induced autophagy is pro- 
tective in nature and its pharmacological or genetic inhibition induces apoptosis in 
these tumor cells (Chhipa et al. 2011). Furthermore, Beclin and its counterpart LC3B 
are involved in the pathogenesis of benign prostatic hyperplasia (BPH) cells and pro- 
mote androgen independence in prostate cancer cells (Liu et al. 2013a). 





1.4 Autophagy-Associated Genes: 
Mutations and Role in Cancer 


There are 32 autophagy-related genes or ATG genes till discovered, out of which 18 
ATGs are directly involved in autophagosome formation upon starvation (Mizushima 
et al. 2011). Most of ATG genes are evolutionary conserved between yeast and mam- 
mals. Somatic mutations in ATG genes are frequently observed in different cancers. 
Frameshift mutations in ATG2B, ATGS, ATG9B, and ATG12 are found in gastric and 
colorectal carcinomas (Kang et al. 2009). The frequency of such mutations is immense 
in carcinomas with high microsatellite instability as compared to those with low micro- 
satellite instability. DNA sequence analysis of gastric and colorectal carcinoma patients 
found single-base deletion mutations in exon 20 of ATG2B and in exons 8 and 10 of 
ATGS, identical deletion mutations in exon 1-1, and three identical deletion mutations in 
exon 1-2 of ATG9B (Kang et al. 2009). Another autophagy gene, ATG12, is commonly 
mutated in breast cancer cells targeted against HER2-based therapies. ATG12 is upregu- 
lated in trastuzumab-resistant HER2-positive breast cancer cell lines as compared to 
trastuzumab-sensitive cell lines (Cufi et al. 2012). Quantitative real-time PCR-based 
arrays of 84 autophagy genes in trastuzumab-responsive SKBR3 and _ trastuzumab 
refractory JIMT1 breast cancer cell lines reveal the overexpression of ATG12 in JIMT1 
cells as compared to SKBR3 (Cufi et al. 2012). Genetic knockdown of ATG12 by small 
hairpin RNA sensitizes breast JIMT1 cells to trastuzumab and HER1/HER2 tyrosine 
kinase inhibitors. Trastuzumab treatment showed strong tumor growth inhibitory effect 
in ATG12-shRNA/JJIMT1 xenograft animal models as compared to wild-type ATG12 
expressing JIMT xenografts (Cufi et al. 2012). Additionally, autophagy gene UVRAG is 
found to be mutated in many cancers (Liang and Jung 2010; Ionov et al. 2004). Frameshift 
mutations in UVRAG are present in colorectal and gastric carcinomas with high micro- 
satellite instability (Kim et al. 2008). UVRAG is a Beclin-interacting protein that associ- 
ates with Beclin-VPS34-VPS15 complex and promotes vesicular trafficking and 
autophagosome formation. UVRAG suppresses the proliferation and tumorigenicity in 
human colon cancer cells (Liang et al. 2006). Genetic silencing of UVRAG or Beclin by 
specific siRNA increases radiation-induced DNA double-stranded breaks and apoptotic 
cell death in 5-fluorouracil (5-FU)-treated irradiated colorectal cancer cells. UWRAG 
and Beclin interact with each other during DNA repair, and UVRAG mutants that are 
unable to bind to Beclin show the greater extent of DNA damage during irradiation as 
compared to normal UVRAG expressed cells. Furthermore, Beclin and UVRAG sup- 
pression increases centrosome number in cells that leads to spindle malformations and 
chromosome segregation errors (Park et al. 2014). Abnormal UVRAG expression along 
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with BRCA1, BECN1, CCND1, and PTEN genes has been associated with human 
breast carcinogenesis. The mRNA levels of these genes are downregulated in breast 
cancer cells as compared to normal breast tissues and linked with the pathogenesis of the 
disease (Wu et al. 2012). In non-medullary thyroid carcinoma (NMTC), genetic varia- 
tions in autophagy genes increase susceptibility for cancer progression and outcome. 
NMTC patients show the statistically significant relation between ATGS genetic variants 
and susceptibility for NMTC. G allele of the ATG5 rs2245214 SNP is mutated in NUTC 
patients and shows association with prognosis of the disease (Plantinga et al. 2014). 
Another important link between autophagy and thyroid carcinogenesis is single nucleo- 
tide polymorphism known as Thr300Ala polymorphism (threonine at position 300 is 
replaced by alanine, rs2241880) in ATGI6L gene, which increases susceptibility for 
thyroid cancer. One possible mechanism for such association between ATG16L and thy- 
roid cancer is the modulation of pro-inflammatory cytokine IL-16 by ATG16L which 
hinders its antiproliferative effect in thyroid cancer cells (Huijbers et al. 2012). 
Furthermore, Thr300Ala polymorphism in ATG16L is also associated with the increased 
risk of developing colorectal carcinoma, and patients carrying the less common GG 
genotype are at higher risk than those carrying more conmen AA genotype (Nicoli et al. 
2014). Additionally, colorectal cancer patients show enhanced expression of ATG1O, 
which is associated with lymphovascular invasion and lymph node metastasis. ATG10 is 
highly upregulated in patients with sporadic colorectal cancer and is involved in metas- 
tasis and tumor invasion (Jo et al. 2012). Patients who did not express ATG10 have sig- 
nificantly higher disease-free survival and overall survival rate than those bearing 
ATG10-expressing tumors. Colorectal carcinoma cell lines AMC5, LoVo, SW480, 
SW48, HCT15, DLD1, RKO, and CaCo2 show higher ATG1O expression as compared 
to normal colorectal cancer cell line CCD841. Silencing of ATG10 by using siRNA 
approach suppresses cell proliferation in HCT116 cells (Jo et al. 2012). Furthermore, 
mutations in 5q14 regions of ATG1O are found in ovarian (Ramus et al. 2003), gastric 
(Oga et al. 2001), and pancreatic cancer (Shiraishi et al. 2001). ATG10 linked two SNPs, 
rs1864182 and rs10514231, which are associated with risk factors in developing breast 
cancer. These genetic variants of ATG10 increase the susceptibility of breast cancer in 
Chinese population (Qin et al. 2013). Furthermore, ATG10 expression is upregulated in 
mesenchymal stem cells along with ATG12 and LC3B in serum-starved breast cancer 
cells. The increased expression of these proteins supports cell survival and growth by 
providing energy and secreting anti-apoptotic proteins (Sanchez et al. 2011). Serum star- 
vation decreases the proliferation in breast cancer cell line MCF-7; however, its 
co-culture with normal or serum-starved mesenchymal stem cells increases their survival 
rate and proliferation. Inhibition of autophagy with autophagy inhibitors chloroquinone 
(CQ) and bafilomycin or by Beclin silencing decreases cell survival to a great extent 
signifying the protective role of autophagy during stress (Sanchez et al. 2011). 





1.5 Genetic Regulation of Autophagy in Tumor 
Suppression and Promotion 


After the discovery of Beclin having both tumor-suppressive and autophagy-inducing 
functions, many autophagy genes have been discovered, and mutations in these 
genes have been linked to tumor progression and growth. Mutations and deletions of 
three critical autophagic genes, ATG2A (1%), ATG7 (2%), and ATG13 (5%), are 
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found in nasopharyngeal carcinoma (NPC) patients (Lin et al. 2014). Although these 
changes are not significant, this is the first report about such genetic lesions in the 
ATG genes of cancer cells. Another important autophagy protein that has tumor- 
suppressive functions is ULK-1, a member of ATG13L-ULK-1/2-FIP200 complex. 
The mRNA and protein levels of ULK-1 are lower in breast cancer tissues as com- 
pared to matched normal tissues. Immunohistochemical staining of ULK-1 in 298 
nonmetastatic invasive breast cancer tissues revealed lesser expression of ULK-1 in 
70% of cases, whereas the adjacent noncancerous tissues have moderate to strong 
expression. The diminished ULK-1 expression is associated with reduced autopha- 
gic capacity and the progression of the disease (Tang et al. 2012). These findings also 
suggest the use of ULK-1 as a novel prognostic biomarker for breast cancer patients. 
ULK-1 and its counterpart ULK-2 are the transcriptional targets of tumor suppressor 
protein p53 in cells. Their transcription is upregulated by p53 during DNA damage 
which induces autophagy in cells. Similarly, DNA-damaging agents etoposide and 
camptothecin induce autophagy through this mechanism triggering cell death (Gao 
et al. 2011). Furthermore, treatment of human colon cancer cell lines with different 
p53 status including HCT116 (p53 wild-type cells), HCT116/p53KO (p53 knockout 
cells), RKO and RKO-E6 (p53-blunted), and human bone osteosarcoma epithelial 
cells (wild type or knock out p53) with DNA-damaging agent camptothecin shows 
reduced expression of ULK-1 in p53 null cells as compared to p53-positive cells. 
Additionally, ectopic expression of ULK-1 enhanced autophagy in U2OS cells and 
shows additive effect with rapamycin on autophagic cell death. Additionally, ULK1 
knockdown attenuates ectopically expressed p53-mediated autophagy and cytotoxic- 
ity in these cells (Gao et al. 2011). However, the role of ULK-1 in cancer is contro- 
versial, and the linearity of its expression with disease prognosis varies in different 
tumor types. In some tumors, the high expression of ULK-1 is associated with the 
severity of the disease and overall survival time in patients (Jiang et al. 2011). ULK-1 
protein levels are upregulated in esophageal squamous cell carcinoma (ESCC) cell 
lines and tumor samples as compared to normal esophageal cells and tissues. ESCC 
cell lines EC109, KYSE140, KYSE510, and KYSES20 have high ULK-1 expression 
as compared to normal esophageal cell line NE1 (Jiang et al. 2011). Tumor to normal 
ratio of ULK-1 mRNA isolated from ECACC patients and normal persons is approx- 
imately 0.68—1.44-fold high signifying the correlation of ULK-1 with cancer pro- 
gression. Additionally, the upregulated expression of ULK-1 is inversely correlated 
with the overall less survival time in patients. Silencing of ULK-1 by gene-specific 
mRNA induces cytotoxicity and triggers apoptosis in ESCC cell lines (Jiang et al. 
2011). These results further point out the protective role of autophagy in cancer cells 
and the response of tumors against cancer therapy. The upregulation of ULK-1 
expression is also found in hepatocellular carcinoma (HCC) patients, and it is signifi- 
cantly associated with tumor size and progression. Patients with low ULK-1 expres- 
sion have longer survival time than those with high ULK-1 expression (Xu et al. 
2013). One of the reasons behind increase in transcription of ULK-1 in solid tumors 
is the creation of hypoxia. Exposure of cells to hypoxic conditions induces unfolded 
protein response (UFR) and HIF-1 activation that triggers ULK-1 mRNA transcrip- 
tion (Schaaf et al. 2013). UFR or ER stress activates ATF4 or activating transcription 
factor 4 which directly binds to the promoter region of ULK-1 DNA and increases its 
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transcription (Pike et al. 2013). Upregulated ULK-1 induces autophagy which pro- 
motes growth and survival of tumor cells during hypoxia. Ablation of ULK-1 or 
ATF4 in epidermoid carcinoma cell line A431 and breast cancer cell line MCF-7 
suppresses autophagy and reduces clonogenic survival of cells, decreases cellular 
ATP levels, increases cell apoptosis, and reduces spheroid growth (Pike et al. 2013). 
Loss of function of autophagy protein, FIP200, has been found in many cancers. 
FIP200 is involved in various cellular process including cell survival, cell growth, 
cell proliferation, embryonic development, metastasis, and differentiation (Gan and 
Guan 2008). FIP200 is located on 8q11 chromosome started from 53,535,016 bp 
from pter to 53,658,403 bp from pter and comprises of 123,388 bases. This region 
contains several loci of presumptive tumor suppressor genes, and heterozygosity of 
this region has been linked with various tumor types. Loss of function of this region 
is present in prostate cancer (Perinchery et al. 1999), breast cancer (Dahiya et al. 
1998), colorectal cancer (Staub et al. 2006), hepatocellular carcinoma (Katoh et al. 
2005), and ovarian cancer (Dimova et al. 2009). FIP200 deletion in mammary epi- 
thelial cells suppresses breast cancer initiation, progression, and metastasis (Wei 
et al. 2011). Conditional knockout of FIP200 gene in mouse model of breast cancer 
decreases tumor burden and increases overall survival time as compared to control 
mice containing functional FIP200 gene (Wei et al. 2011). Furthermore, these mice 
show fewer metastatic nodules as compared to control mice. Deletion of FIP200 
causes autophagy defects in MMTV-PyMT transgenic mice (conditional knockout 
mice or CKO) like accumulation of large ubiquitin-positive or p62-positive aggre- 
gates, deformed mitochondria, and deficient LC3 accumulation. These mice show 
reduced cell proliferation, cell cycle arrest, decreased anchorage-independent growth 
in soft agar, and glycolysis as compared to control mice. Additionally, FIP200 dele- 
tion in Ras-transformed primary mouse embryonic fibroblasts inhibits their prolif- 
eration, cell cycle progression, glucose uptake, lactate formation, and 
anchorage-independent growth in soft agar (Wei et al. 2011). These tumors also 
show defective autophagy and increased expression of several chemokines including 
CXCL9 and CXCL1O0 that initiates increased immune surveillance (Wei et al. 2011). 
Downregulation of FIP200 by using small interfering RNA triggers apoptotic induc- 
tion in human glioblastoma cells, immortalized human astrocytes, and primary 
human brain MvEC. FIP200 directly interacts and inhibits proline-rich tyrosine 
kinase 2 (Pyk2) expression in these cells and abrogates Pyk2-mediated regulation of 
calcium ion channels and activation of MEK-ERK signaling (Wang et al. 2011). 
Another important tumor promoter autophagy gene is lysosomal-associated mem- 
brane protein | or LAMP-1. It is located on the surface of lysosomes and endosomes 
and assists lysosomal and autophagosome fusion (Eskelinen 2006). About one third 
of ovarian serous adenocarcinoma tumors show LAMP-|! over expression in their 
cytoplasm. Expression analysis of LAMP-1 protein in normal ovarian tissue and 
ovarian adenocarcinomas of stages IIb, III, and IV reveals high expression of 
LAMP-I as compared to normal ovarian tissue (Marzinke et al. 2013). 
Immunohistochemistry of these tumor lysates shows the presence of LAMP-| in epi- 
thelial cell cytoplasm and few on the surface of plasma membrane. Furthermore, 
about 73 percent of LAMP-positive adenocarcinomas are positively stained for epi- 
dermal growth factor receptor (EFGR) and exhibit moderate to strong EGFR 
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signaling. Co-incubation of normal OV90 epithelial ovarian cancer cells with ovarian 
cancer ascites increases LAMP-1 expression and promotes cancer cell migration and 
proliferation suggesting the role of LAMP-1 in tumor progression (Marzinke et al. 
2013). LAMP-1 and LAMP-2 overexpression and its correlation with tumor progno- 
sis are also found in pancreatic carcinoma patients (Kunzli et al. 2002). LAMP-1 and 
LAMP-2 mRNA is moderately present in normal islet cells and weekly in most aci- 
nar cells, while it is highly upregulated in pancreatic ductal carcinoma cells. The 
upregulated expression of LAMP-1 is associated with tumor survival, and pancreatic 
carcinoma patients exhibiting high levels of LAMP-1 have significantly higher post- 
operative tumor survival period than patients whose tumors have low to moderate 
LAMP-1! mRNA levels (Kunzli et al. 2002). The role of other autophagy genes in 
tumor suppression and promotion is summarized in Table 1.1. 


Table 1.1 Role of autophagy-associated genes in tumor suppression and promotion 











Gene name | Functions Role in cancer 






Overexpression of ATG3 inhibits cell 
growth and promotes apoptosis in leukemic 
SKM-1 cells (Wang et al. 2014a) 


Catalyzes the conjugation of 
LC3B and 
phosphatidylethanolamine (PE) 








Inhibition of starvation-induced autophagy 
by ATG3 silencing suppresses epithelial- 
mesenchymal transition (EMT) and 
abrogates invasiveness in HCC cell lines 
HepG2 and BEL7402 (Li et al. 2013) 




















ATG14 Compete with UVRAG for Beclin | Silencing of ATG14 sensitizes 
binding and ATG14-Beclin osteosarcoma cells to cisplatin-induced 
complex is required for their apoptosis (Zhao et al. 2014) 
localization to autophagosomes 
ATG16 ATG16 form complex with Hypermethylation of Atg16L is associated 
ATGS-ATG12 conjugate and form | with poor prognosis in CLL against 
ATG16-ATGS-ATG12 complex imatinib treatment (Dunwell et al. 2010) 
which is required for LC3 
lipidation and autophagosome 
formation 
Bif-1 Form complex with Beclin and Bif-1-/" mice are more prone to 
enhances Beclin-VPS34 activity tumorigenesis as compared to wild type 
during autophagy (Takahashi et al. 2007) 
Reduced expression of Bif-1 is associated 
with short survival period in CRC stage I 
and II patients (Ko et al. 2013) 
Bif haploinsufficiency suppresses 
mitophagy, induces chromosomal 
damage, and inhibits apoptosis 
(Takahashi et al. 2013) 
AMBRA-1_ | AMBRA-1 binds to Beclin and Starvation-induced autophagy is associated 
(activating potentiates the lipid kinase with AMBRA-1 upregulation in colorectal 
molecule in | activity of Beclin/VS34 complex _ | cancer cells, and its genetic silencing 
BECNI- during autophagy enhances apoptotic effects of etoposide and 
regulated staurosporine colorectal cancer cell lines 
autophagy (Gu et al. 2014) 


protein-1) 
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Table 1.1 (continued) 





Gene name 


Functions 


Role in cancer 


AMBRA-1 is overexpressed in 
cholangiocarcinoma patients and associated 
with poor prognosis. AMBRA-| increases 
Snail expression in cholangiocarcinoma 
cell lines and promotes EMT 

(Nitta et al. 2014) 





P62 


P62 binds with ubiquitinated 
protein aggregates and promote 
their degradation by autophagy 
pathway via interacting with 
LC3B 


P62 accumulation is correlated 
with lymph node metastasis in 
NSCLC adenocarcinoma patients 
(Inoue et al. 2012) 





P62 is overexpressed in cisplatin-resistant 
SKOV3/DDP (ovarian cancer) cells and 
knockdown of p62 sensitizes cells to 
cisplatin treatment. P62 activates Keap1- 
Nrf2-ARE pathway and induces 
antioxidant gene expression in SKOV3/ 
DDP cells (Xia et al. 2014) 





P62 is overexpressed in breast tumors 
(Thompson et al. 2003) 





Etoposide- 
induced 
protein 2.4 
homologue 
(EI24) 


Autophagosome formation and 
clearance of protein aggregates 


Mutated in aggressive breast cancers 
and function as tumor suppressor 
(Zhao et al. 2005) 





Inhibition of EI24/PIG8 in fibroblasts and 
breast cancer cells abrogates the apoptotic 
effect of etoposide (Mork et al. 2007) 





Immunity- 
related 
GTPase 
family M 
(RGM) 


IRGM involves in autophagy- 
mediated immunity against 
pathogens. It interacts with ATGS, 
ATG10, SH3GLB1, and LC3 and 
promotes phagosome maturation 
(Petkova et al. 2012; Singh et al. 
2006) 





IRGM 1s4958847 polymorphism is 
associated with susceptibility to gastric 
cancer. Carriers of the rs4958847 A allele 
is protected against gastric cancer 
development (Burada et al. 2012). IRGM 
genetic polymorphism, rs13361189TC and 
polymorphic rs13361189CC genotype, 
increases in glioma patients as compared to 
healthy individuals and is associated with 
increased expression of IFN-y and IL-4 
which play an important role in glioma 
development (Ge et al. 2014) 





1.6 | MicroRNAs and Autophagy: Role in Cancer 


MicroRNAs (miRNAs) are small noncoding RNA molecules of about 20-25 nucle- 
otides in length distributed throughout the animal and plant kingdom. miRNAs 
regulate gene expression at posttranscriptional and translational levels, and dysregu- 
lation in this process is linked with various diseases including cancer. The first direct 
link between miRNA and cancer was observed in chronic lymphocytic leukemia 
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where a deletion in a small region of 13q14 chromosome leads to the downregula- 
tion of two miRNAs, miR15 and miR16. This deletion is present in 65% of CLL and 
correlated with the pathogenesis of the disease (Calin et al. 2002). Aberrant changes 
in miRNA expression have been detected in many types of tumors, and its role in 
disease prognosis and pathogenesis is continuously explored (Price and Chen 2014). 
The abnormally expressed miRNAs target the transcripts of various tumor suppres- 
sor or promoter genes in cancer. Somatic mutations associated with miRNA pro- 
cessing have been found in many cancers. MiRNA profile of tumor cells is different 
from normal cells, and they can be used in cancer diagnostics (Paranjape et al. 
2009). Many autophagy genes are regulated by miRNA, and any dysregulation in 
this process may lead to defective autophagy (Frankel and Lund 2012). Since 
autophagy defects are common in cancer, its regulation by miRNAs is an important 
area of interest that may provide a potential therapeutic advantage in cancer treat- 
ment. The first report about miRNA regulation of autophagy was published by Zhu 
et al. in 2009, where they demonstrated that miR-30a directly binds to 3’-UTR of 
Beclin and negatively regulates its expression. Inhibition of autophagy by overex- 
pression of miR-30a sensitizes tumor cells to chemotherapy-induced apoptosis. 
Chemotherapeutic drug taxol induces protective autophagy by decreasing miR-30a 
levels in cancer cells. Overexpression of miR-30a represses beclin-dependent 
autophagy and enhances apoptosis in cells (Zou et al. 2012). MiR-30d regulates 
many autophagy genes including BECN1, BNIP3L, ATG12, ATGS, and ATG2 in 
cells and found frequently mutated in human epithelial cancers. MiR-30d binds to 
3’-UTR sequences of these genes and inhibits their expression at transcriptional and 
translational levels (Zhang et al. 2014). Inhibition of beclin by miR-30d suppresses 
autophagy and sensitizes human anaplastic thyroid carcinoma (ATC) cells to cispla- 
tin treatment both in vitro and in vivo (Zhang et al. 2014). Furthermore, miRNA- 
induced autophagy promotes radioresistance in cancer cells. Overexpression of 
miR-23b in pancreatic tumors undergoing radiation therapy decreases ATG12 levels 
and autophagy which sensitizes these cells to treatment (Wang et al. 2013). 
Additionally, dysregulation of miR-130a increases protective autophagic flux in 
cancer cells (Kovaleva et al. 2012). Ectopic expression of miR-130a inhibits cell 
proliferation and induces apoptosis in chronic lymphocytic leukemia (CLL) cell 
lines (Kovaleva et al. 2012). MiR-130a has also been shown to inhibit autophagy in 
CLL cells by downregulating the transcription of autophagy genes DICER1 and 
ATG2B. It has been observed that CLL patients nonresponsive to chemo treatment 
show reduced miR-130a expression and enhanced autophagy (Kovaleva et al. 2012). 
Another miRNA, miR-101, inhibits autophagy and sensitizes breast cancer cells to 
tamoxifen treatment (Frankel et al. 2011). MiR-101 downregulates the activation of 
STMN1 (Stathmin/Oncoprotein 18), RABSA, and ATG4D (Frankel et al. 2011) 
genes in breast cancer cells. They are important regulators of autophagy and their 
expression is highly upregulated in many tumors (Marklund et al. 1996; Rana et al. 
2008). All three genes have miR-101-binding motifs in their 3’UTRs, and point 
mutations in this region significantly extirpate their miR-101-dependent downregu- 
lation. Furthermore, cotreatment of miR-101 with chemotherapeutic drug 4 
hydroxytamoxifen significantly reduces autophagy and cell survival in breast cancer 
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cell lines (Frankel et al. 2011). Additionally, miR-22 suppresses proliferation in 
osteosarcoma cells by targeting high-mobility group box 1 (HMGB1) protein- 
mediated autophagy. Osteosarcoma cells show high HMGB1 expression after anti- 
cancer therapy and induce protective autophagy which contributed to chemoresistance 
(Li et al. 2014). Since the role of autophagy in cancer cells is controversial, its 
modulation by micro-RNA also has different impacts in cancer cells. There are 
many other examples where inhibition of autophagy by miRNA prevents autophagic 
cell death in cancer cells and hence promotes their survival. Inhibition of autophagy 
by miR-25 prevents autophagic cell death in breast cancer cells. MiR-25 directly 
binds to the 3’UTR region of autophagy genes ATG14 and ULK-1 and inhibits their 
transcription (Wang et al. 2014b). Furthermore, miR290-295 cluster prevents mela- 
noma cancer cells from starvation-induced autophagic cell death. miR290—295 
inhibits autophagy induction by downregulating several essential autophagy genes 
involved in the formation of class III PI3 kinase complex, ATG12 and ATG8 conju- 
gation systems, and ULK1/ATG1 complex. Overexpression of miR290-295 in 
B16F1 melanoma cells undergoing glucose deprivation prevents persistent autoph- 
agy and hence promotes survival (Chen et al. 2012). Additionally, miR-17 prevents 
autophagy induction in glioma cells treated with temozolomide and promotes their 
survival. MiR-17 directly binds to 3’UTR region of ATG7 and prevents ATG7- 
mediated autophagy induction. Inhibition of miR-17 by using anti-miR-17 adminis- 
tration in glioma cells enhanced temozolomide-induced cytotoxicity and 
radiosensitivity (Comincini et al. 2013) as described in Fig. 1.2. 
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Fig. 1.2 Regulation of autophagy by miRNAs and its role in cancer 
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1.7. Epigenetic Regulation of Autophagy in Cancer 


Epigenetic regulation of DNA is an important process in cells and plays a critical 
role in replication, transcription, repair, and development. Similar to genetic changes, 
epigenetic defects play an important role in the genesis of cancer. Almost all cancers 
show epigenetic defects, and along with genetic alterations, they are fully involved 
in initiation and progression of the disease. There are two primary epigenetic mecha- 
nisms that occur in the cells. One is DNA methylation and another is the covalent 
modifications of the histones. These modifications are performed by chromatin- 
modifying enzymes in a highly regulated manner. DNA methylation process involves 
the methylation of CpG islands that are mostly found in or near the promoter region 
of mammalian genes or in the region of repetitive DNA sequences in chromosome- 
like centromere, retrotransposons, etc. CpG islands are short DNA stretches of about 
1000 bp that are rich in CpG dinucleotides. In normal cells, they usually demethylate 
in the regions near promoter sequences and heavily methylated in other parts of the 
genome in order to prevent chromosome instability. DNA methylation is catalyzed 
by a family of enzymes called DNA methyltransferases that transfer methyl group 
from donor S-adenosylmethionine (SAM) to carbon-5 of the cytosine residues 
(5mC) in CpG dinucleotides. Hypermethylation of CpG dinucleotides present within 
the promoter regions of genes leads to their silencing by preventing the binding of 
transcription factors to the DNA. Silencing of genes involved in tumor suppression 
by DNA hypermethylation promotes uncontrolled cell division, leading to tumori- 
genesis (Baylin 2005). Furthermore, the hypomethylation of CpG dinucleotides 
present in other parts of the genome led to the reactivation of suppressed elements 
that cause genomic instability. The second mechanism of epigenetic regulation 
involves histone modifications like acetylation, methylation, ADP ribosylation, 
SUMOpylation, phosphorylation, ubiquitination, O-GlcNAcylation, etc. of histone 
residues. These modifications regulate chromatin condensation and DNA accessibil- 
ity in cells. However, unlike DNA methylation, different types of histone modifica- 
tions have different impacts on the transcription of histone DNA. For example, 
acetylation of e-amino group of lysine side chains of histones catalyzed by histone 
acetyltransferases (HAT) neutralizes the lysine-positive charge and weakens the 
interactions between histones and DNA, thus making DNA accessible for transcrip- 
tion, whereas methylation of histones can lead to transcriptional repression or activa- 
tion of DNA depending upon the mode of methylation. Dysregulation in acetylation 
and deacetylation process are most common posttranslational histone modifications 
found in human cancers. Unlike genetic changes, epigenetic abnormalities are 
reversible in nature, and the agents that restore this balance are of great interest in 
cancer therapy. The most validated targets in cancer epigenetics are histone deacety- 
lases (HDACs) and DNA methyltransferases. Although structural mutations in 
HDACs are rare in cancer, their elevated expression has been associated with many 
tumor types. HDAC1 activity is found upregulated in various cancers like, gastric, 
stomach, esophagus, colon, prostate, breast, ovary, lung, pancreatic, and thyroid can- 
cer, therefore, associated with the poor prognosis of the disease (Nakagawa et al. 
2007; Choi et al. 2001). Similarly, aberrant expression of other HDACs is also found 
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in many tumor types. HDAC2 is upregulated in cervical (Huang et al. 2005), gastric 
(Song et al. 2005), colorectal (Zhu et al. 2004), and prostate cancer (Yin and Fu 
2013), HDAC3 in colorectal (Spurling et al. 2008) and prostate cancer (Weichert 
et al. 2008), HDAC4 in breast cancer (Ozdag et al. 2006), HDACS and HDAC7 in 
colorectal cancer (Ozdag et al. 2006; Zhu et al. 2011), and HDAC6 in breast and oral 
squamous cell carcinoma (Zhang et al. 2004; Sakuma et al. 2006). The overexpres- 
sion of these HDACs directly contributes to tumorigenicity and is associated with 
the poor prognosis of the disease. Most of HDACs are involved in the regulation of 
autophagy in cells. HDAC1 and HDAC2 are involved in autophagosome formation 
and regulate autophagy in skeletal muscles (Moresi et al. 2012). Deletion of HDAC1 
and HDAC2 blocks autophagic flux and causes progressive myopathy in mice 
(Moresi et al. 2012). HDAC1 and HDAC2 double-knockout mice showed impaired 
autophagosome formation, mitochondrial abnormalities, and accumulation of p62. 
Conversely, ectopic expression of both alleles restores these functions in mice and 
promotes their survival (Moresi et al. 2012). HDAC6 is required for fusion of 
autophagosomes to lysosomes (Lee et al. 2010). Double-knockout HDAC6 mice 
show enhanced accumulation of autophagosomes and defective autophagy (Lee 
et al. 2010). HDAC6-mediated autophagy involves F-actin cytoskeleton structures 
that form specialized F-actin network over vesicles and lysosomes that undergo 
fusion (Gao et al. 2007; Jahraus et al. 2001). F-actin polymers are also present on 
protein aggregates that undergo autophagy-dependent degradation (Lee et al. 2010). 
The assembly of F-actin polymers over protein aggregates is facilitated by HDAC6 
substrate cortactin. Knockdown of cortactin by siRNA prevents the formation of 
these polymers over protein aggregates and inhibits their degradation. However, cor- 
tactin silencing has little impact on the distribution of F-actin polymers on lyso- 
somes (Lee et al. 2010). Furthermore, HDAC6 deacetylates LC3B and increases its 
transcription during serum starvation in cervical cancer cell line HeLa (Liu et al. 
2013b). HDAC6 expression is lower in HCC patients and is associated with the poor 
prognosis of the disease (Jung et al. 2012). The recurrence-free survival rate of HCC 
patients having low HDAC6 expression is lower than those with high HDAC6 activ- 
ity. Ectopic expression of HDAC6 induces autophagic cell death in HCC cell lines. 
The mechanism of autophagy induction by HDAC6 in cells involves the activation 
of Beclin via c-JunNH2-terminal kinase pathway. Inhibition of autophagy with 
autophagy inhibitor 3-MA or with SP600125, a JNK-specific inhibitor, effectively 
blocks HDAC6-induced cell death (Jung et al. 2012). Another member of histone 
deacetylases, SIRT-1 (Sirtuin family of proteins, an NAD-dependent deacetylases), 
induces autophagy in normal prostate cells and is required for prostate gland devel- 
opment and maintenance (Powell et al. 2011). SIRT-1 regulates the expression of 
late autophagy proteins ATG4, ATG7, and ATG8 by promoting their deacetylation. 
Deletion of SIRT-1 causes deficient autophagy and abnormal prostate development 
that leads to prostate intraepithelial neoplasia (Powell et al. 2011). Furthermore, 
SIRT" embryonic stem cells (ESCs) of human and mouse origin show lower 
expression of Beclin and LC3B and enhanced phosphorylation of mTOR substrates 
P70/85-S6 kinase and ribosomal-S6. Treatment of these cells with H,O, shows 
enhanced mitochondrial membrane potential loss and apoptosis as compared to 
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wild-type ESCs reflecting the protective role of autophagy in ESCs during oxidative 
stress (Ou et al. 2014). Contrary to SIRT-1, SIRT-2 inhibits the basal levels of 
autophagy and abrogates the chemoresistance developed against microtubule inhibi- 
tors due to prolonged mitotic arrest (Inoue et al. 2014). Knockdown of SIRT-2 
increases the basal levels of autophagy and triggers mitotic arrest for longer periods, 
which confer resistance to microtubule inhibitors. Such type of resistance is also 
seen in rapamycin-treated or mild-starved cells in the presence of microtubule inhib- 
itors, which delays post-slippage death. Silencing of autophagy genes ATGS-ATG7 
or Fox01 (forkhead O family protein) or HDAC6 abolishes such resistance and trig- 
gers cell death. HDAC6 forms complex with cytoplasmic Fox01 and inhibits its 
activation. Disruption of HDAC6-Fox01 complex by stimuli-like stress releases 
Fox01 and promotes its acetylation. Acetylated Fox01 binds to ATG7 and induces 
autophagy in cells (Inoue et al. 2014). Furthermore, HDAC counterparts, histone 
acetyltransferases (HATs), regulate the expression of many proteins via autophagy, 
and dysregulation in this process promotes tumor. One example of such regulation is 
the acetylation of glycolytic enzyme pyruvate kinase M2 (PKM2) on lysine 305 resi- 
due which decreases its activity and promotes degradation via chaperone-mediated 
autophagy (Lv et al. 2011). PKM72 is highly expressed in cancer cells and is involved 
in metabolic reprogramming which switch cells from oxidative phosphorylation to 
aerobic glycolysis and carcinogenesis (Christofk et al. 2008). PKM2 acetylation is 
performed by p300 (E1A-binding protein, 300 kDa) acetyltransferases, and it facili- 
tates its binding with HSC70 (heat shock protein), a chaperone which recruits target 
proteins to lysosomes for chaperone-mediated autophagic degradation (Lv et al. 
2011). PKM2 lysine 305 mutants accumulate glycolytic intermediates and show 
enhanced cell proliferation and tumor growth as compared to their wild-type coun- 
terparts. Hence, autophagic degradation of PKM2 through acetylation regulates cell 
cycle control and prevents tumorigenicity in cells (Lv et al. 2011). Additionally, the 
turnover of another acetyltransferase, hMOF, is controlled by autophagy in many 
cancer cells. hMOF along with other acetyltransferases KAT8 and MYST1 catalyzed 
the acetylation of lysine 16 residue of histone H4 (H4K16ac) which influences chro- 
matin structure and transcription (Fullgrabe et al. 2013). Autophagy inhibits the 
acetylation of H4 in various cancer cell lines U1810, HeLa, and U2OS. Additionally, 
starvation or rapamycin treatment inhibits the acetylation of H4K16 in MEF cells. 
Cells deficient in ATG1, ATGS, or ATG7 show less inhibition on H4K16ac histone 
modifications after treatment with autophagy inducing stimuli-like starvation or 
treatment with rapamycin or torin (mTOR inhibitor) as compared to wild types. 
Treatment with autophagy inhibitors CQ or 3-MA abrogates autophagy-mediated 
inhibition of hMOF and induces H4K 16 acetylation. Additionally, deacetylation of 
H4K 16ac promotes cell death in cancer cell lines HeLa or U1810, which is prevented 
by autophagy inhibitors (Fullgrabe et al. 2013). Furthermore, autophagy is involved 
in the degradation of cytoplasmic chromatin fragments during cellular senescence. 
Senescent cells have chromatin fragments associated with y-H2AX and H3K27me3 
histone in their cytoplasm. Autophagic process degrades these fragments and con- 
tributes to the stability of senescence which may play a role in tumor suppression 
(Ivanov et al. 2013). During serum starvation, glycogen synthase kinase 3 (GSK3) 
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phosphorylates and activates acetyltransferase KAT5/TIP6O in cells, which in turn 
acetylates and activates ULK-1 at lysine 162 and lysine 606 residues and induces 
protective autophagy (Lin et al. 2012). GSK3 inhibition abrogates KAT5/TIP60 acti- 
vation in serum-deprived cells. Additionally, cells with mutant KATS/TIP60 that 
cannot be phosphorylated by GSK3 are resistant to serum starvation induced autoph- 
agy. Cells that have acetylation-defective ULK-1 fail to rescue autophagy in ULK-1- 
silenced MEF cells indicating that acetylation is a necessary event in ULK-1 
activation (Lin Sy 2012). The loss of TIP60 leads to an accumulation of double- 
stranded breaks and induces genomic instability in cells (Chailleux et al. 2010; Murr 
et al. 2006). TIP60 mutations are found in many cancers including head and neck 
squamous cell carcinoma, breast cancer, colorectal cancer, and lymphoma (Sakuraba 
et al. 2009; Gorrini et al. 2007). 

The other major epigenetic alteration found in tumor cells is DNA methylation 
(Kulis and Esteller 2010). DNA hypermethylation of certain tumor suppressor 
genes silenced their activation, whereas hypomethylation on the heterochromatin 
regions of DNA induces genomic instability and promotes tumorigenesis (Kulis and 
Esteller 2010). DNA methylation controls the expression of many autophagy genes 
in various tumor types. DNA methyltransferase G9a controls the transcription of 
key autophagy genes involved in autophagosome formation and modulates autoph- 
agy under normal growth conditions. Methyltransferase G9a is ubiquitously 
expressed in somatic cells and mainly localized in the euchromatin region of 
DNA. It is expressed in a variety of tumors including leukemia (Lehnertz et al. 
2014), prostate cancer (Kondo et al. 2008), lung cancer (Chen et al. 2010), and neu- 
roblastoma (Ke et al. 2014). G9a expression is higher in stage 4 neuroblastoma 
tumors as compared to stage 3 or 4S and is associated with tumor-related deaths 
(Ke et al. 2014). DNA methyltransferase G9a forms complex with another methyl] 
transferase G9a-like protein (GLP) and promotes the methylation of CpG islands in 
the promoter region of autophagy genes which repress their activation (Artal- 
Martinez de Narvajas et al. 2013). Under normal conditions, G9a associates with 
the promoters of LC3B, WIPIs (WD repeat domain phosphoinositide-interacting 
propeller proteins), and DOR (LC3B-interacting protein diabetes and obesity regu- 
lated) and epigenetically represses them. Silencing of G9a by siRNA in cervical 
cancer cell line HELA induces LC3B accumulation which promotes autophagy 
(Artal-Martinez de Narvajas et al. 2013). Pharmacological or genetic inhibition of 
G9a inhibits cell proliferation, decreases tumorigenicity, and induces autophagy in 
neuroblastoma cells (Ke et al. 2014). Furthermore, hypermethylation of autophagy 
gene ATGI6L is found in 69% of CP-CML (chronic phase chronic myeloid leuke- 
mia) patients. Patients having methylated ATGI6L show significantly decreased 
major molecular response rate at 12 and 18 months of imatinib treatment in com- 
parison with patients with unmethylated ATGI6L gene (Dunwell et al. 2010). 
Additionally, hypermethylation and inactivation of the promoter region of tumor 
suppressor gene ARH] (Aplasia Ras homologue member I, also known as DIRAS3) 
are found in ovarian cancer patients (Feng et al. 2008). ARH1 inhibits cell prolifera- 
tion and motility by targeting PI3 kinase/Akt pathway and induces autophagy in 
cells via ATG4 upregulation. Serum starvation or mTOR treatment inhibits cell 
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proliferation in ovarian cancer cells via ARH1 activation which also induces protec- 
tive autophagy in cells. Inhibition of autophagy with CQ enhanced cytotoxicity and 
reduces regrowth of xenografted tumors (Lu et al. 2014). 

Another important tumor suppressor and autophagy-regulating protein, PCDH1, 
is frequently methylated and silenced in almost all gastric and colorectal tumor cell 
lines as well as in 95% of primary tumors but not in normal gastric or colorectal 
tissues. PCDH17 deletion is found in only 18% of gastric and 12% of colorectal 
cancer tissues signifying the importance of its epigenetic silencing in cancer 
(Hu et al. 2013). Furthermore, ectopic expression of PCDH17 inhibits tumor growth 
in these cancer cell lines in vitro and in vivo and promotes apoptosis and autophagic 
cell death (Hu et al. 2013). 





1.8 Epigenetic Modifiers in Cancer and Role of Autophagy 


Several drugs targeting epigenetic regulators are already approved by FDA for can- 
cer therapy, and many are under preclinical investigation. Almost all anticancer 
agents that target such epigenetic defects induce either apoptotic or autophagic cell 
death in cancer cells. However, contrary to apoptosis, autophagy induction by these 
inhibitors has dual functions, and thus modulating this process is of great interest in 
cancer therapy. The most common HDAC inhibitor, suberoylanilide hydroxamic 
acid (SAHA), which is used in the treatment of cutaneous T cell lymphoma, induces 
robust autophagy in many cancer cells. SAHA induces caspase-independent autoph- 
agic cell death in endometrial stromal cells, chondrosarcoma cells, and tamoxifen- 
resistant breast cancer cells (Banreti et al. 2013; Lee et al. 2012). Another HDAC 
inhibitor, valproic acid, induces autophagic cell death and not apoptosis in glioma 
cells (Dong et al. 2013). The mechanism of autophagy induction involves the gen- 
eration of oxidative stress after valproic acid treatment which further activates ERK 
pathway. The blockage of ERK signaling inhibits autophagy and induces apoptosis 
in cells (Fu et al. 2010). Additionally, the combination of valproic acid with temsi- 
rolimus inhibits tumor cell growth and triggers autophagic cell death in Burkitt 
leukemia/lymphoma cell lines (Dong et al. 2013). On the other hand, autophagy 
induced by SAHA in acute myeloid leukemia (AML) cell lines promotes chemore- 
sistance, and inhibition of this process reduces the cell viability and colony-forming 
ability in AML cells (Torgersen et al. 2013). Additionally, genetic or pharmacologi- 
cal inhibition of autophagy-enhanced SAHA induced apoptosis in glioblastoma 
cells (Chiao et al. 2013). Co-treatment of HDAC inhibitor vorinostat with another 
anticancer drug sorafenib induces protective autophagy in hepatocellular carcinoma 
cells, and its inhibition by using 3-MA or beclin silencing enhances their synergistic 
effects (Yuan et al. 2014). Combination of pan-HDAC inhibitor, panobinostat, with 
CQ enhances its antitumor effects against human estrogen/progesterone receptor 
and HER2 (triple)-negative breast cancer (TNBC) cells. CQ co-treatment with pan- 
obinostat results in reduced tumor burden and higher survival rate in MD-AMB-231 
breast cancer xenografts (Rao et al. 2012). Some of the examples of the autophagy 
induction during epigenetic targeting in cancer are summarized in Table 1.2. 
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Table 1.2 Epigenetic targets and the role of autophagy in cancer therapy 





Anticancer agents 


Trichostatin 


Specificity 
HDAC1 and HDAC2 


Role of autophagy in chemo treatment 


Autophagy induces cytoprotection in 
neuroblastoma cell lines against 
trichostatin treatment 

(Francisco et al. 2012) 


Inhibition of autophagy enhances 
trichostatin-induced apoptosis and 
radiosensitivity in colon cancer cells 
(He et al. 2014) 





Romidepsin 


HDAC1 and HDAC2 


Inhibition of autophagy with CQ 
enhances romidepsin-induced cell death 
in malignant rhabdoid tumors 
(Watanabe et al. 2009) 





Apicidin 


HDAC2 and HDAC3 


Apicidin induces apoptosis and autophagy 
in human oral squamous carcinoma cells, 
and inhibition of autophagy enhances 
apicidin-mediated apoptosis 

(Ahn et al. 2011) 





Abexinostat 
(PCI-24781) 


HDAC1 and HDAC2 


Inhibition of HIF-1a attenuates 
abexinostat-induced autophagy in B-cell 
lymphoma cells and decreases cell survival 
(Bhalla et al. 2013) 





Autophagy blockage sensitizes resistant 
malignant peripheral nerve sheath tumors 
to abexinostat-induced apoptosis 

(Lopez et al. 2011) 





Sirtinol 


HDAC3 


Sirtinol induces autophagic cell death in 
MCEF-7 cells by downregulating Sirt1/2 
(Wang et al. 2012a) 





Resveratrol 


HDAC3 


Induces autophagic cell death in lung 
cancer cell line A549 which is rescued by 
genetic or pharmacological inhibition of 
autophagy (Zhang et al. 2013) 





Induces autophagic cell death in chronic 
myelogenous leukemia cells by activating 
JNK pathway (Puissant et al. 2010) 





MGCD0103 
(Mocetinostat) 


HDAC1, HDAC2, 
and HDAC11 


Induction of apoptosis and inhibition of 
autophagy mediate the therapeutic effect of 
MGCD0103 in B-cell chronic lymphocytic 
leukemia (El-Khoury et al. 2014) 





GSK343 


Histone methyltransferase 
EZH2 


Induces autophagic cell death in 
MDA-MB-231, HepG2, and A549 cells 
(Liu et al. 2014) 





BIX-01294 (BIX) 





Euchromatic histone 
lysine 
N-methyltransferase 2 
(EHMT2) 





Induces autophagic cell death in estrogen 
receptor (ESR)-negative SKBr3, ESR- 
positive MCF-7, and HCT116 colon cancer 
cells (Kim et al. 2013) 
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1.9 Posttranslational Modifications in Autophagy Genes: 
Impact in Cancer 


Posttranslational modifications are important part of gene regulation and have been 
associated with early and late stages of cancer. The important posttranslational 
modifications in cells are phosphorylation, acetylation, methylation, ubiquitina- 
tion, farnesylation, glycosylation, and sialylation. The expression of many autoph- 
agy genes is controlled by these posttranslational modifications, and their 
dysregulation promotes tumorigenesis. The main posttranslational modifications 
linked to autophagy proteins are phosphorylation, ubiquitination, and SUMOylation 
(McEwan and Dikic 2011). Autophagy is regulated by a wide variety of kinases 
involved in PI3K-Akt-mTOR pathways (Jaber and Zong 2013; Wang et al. 2012b), 
MAPK-ERK pathway (Wang et al. 2009), Wnt signaling (Petherick et al. 2013), 
JAK-STATs (Jonchere et al. 2013), FAK (focal adhesion kinases) pathways 
(Tuloup-Minguez et al. 2011), and TGF-B (Kiyono et al. 2009). Phosphorylation of 
Beclin at multiple tyrosine residues by the epidermal growth factor receptor 
(EFGR) tyrosine kinase enhances its binding with negative regulators of autophagy 
such as Bcl2 and RUBICON and prevents the formation of Beclin type II PI3 
kinase complex (Wei et al. 2013). The inhibition of EGFR by tyrosine kinase inhib- 
itors induces autophagy in non-small cell lung cancer (NSCLC) cell lines which 
induces cytotoxicity. The overexpression of EFGR enhances Beclin inactivation 
and promotes tumor growth, chemoresistance, and metastasis in NSCLC (Wei 
et al. 2013). Phosphorylation and inactivation of ULK-1 and ATG13 by mTOR 
during nutrient-rich conditions inhibit autophagy in cells (Jung et al. 2009). The 
second common posttranslational modification, ubiquitination, is involved in the 
selective targeting of proteins to autophagosomes which promotes their degrada- 
tion upon autophagosome lysosome fusion. Ubiquitin-binding proteins are selected 
by p62 which directly binds to LC3B via its LC3-interacting binding motif, bring- 
ing these proteins to the autophagosomes. Another mechanism of autophagic deg- 
radation of ubiquitinated proteins involves HDAC6, which directly interacts with 
polyubiquitinated proteins and facilitates their interaction with MTOC (microtu- 
bule-organizing center) and forms aggresomes that are cleared by autophagy (Shaid 
et al. 2013). Ubiquitin-dependent autophagic degradation is an essential process in 
cancer cells to removes large amount of misfolded proteins. Inhibiting cancer cell 
growth by using proteasomal inhibitors that disrupt this mechanism is an important 
area of research in cancer therapy. The third important posttranslational modifica- 
tion in autophagy proteins is SUMOylation. SUMOylation is a proteasomal degra- 
dation process which involves the attachment of a small ubiquitin-like modifier to 
target proteins similar to ubiquitination. SUMOylation is involved in various cel- 
lular processes such as protein stability, nuclear transport, cell cycle progression, 
and transcriptional regulation (Geiss-Friedlander and Melchior 2007). 
SUMOylation of type III PI3 kinase, VPS34, increases the activity of Beclin- 
VPS34 complex and enhances autophagy during stress (Yang et al. 2013). 
Autophagy-inducing stress (starvation or treatment with HDAC inhibitor panobi- 
nostat) triggers the acetylation of heat shock protein 70, which binds and recruits 
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SUMO E3 ligase KAPI to Beclin-VPS34 complex and promotes Vps34 
SUMOylation at lysine 840 residue. Vps34 SUMOylation increases its lipid kinase 
activity and further promotes its association with Beclin in cells (Yang et al. 2013). 
Moreover, the knockdown of hsp70 abolished Vps34-beclin interactions and 
induces of autophagy in cells. 





1.10 Perspective 


Autophagy is a tightly regulated catabolic process which is involved in variety of 
physiological process including removal of damaged or abnormal cellular con- 
stituents, recycling of biomolecules for stress adaptation, host defense mechanism, 
cell death, and embryonic development. Autophagy process comprises wide pro- 
tein network and interlinked cellular pathways. Defective autophagy is found in 
much human pathology including cancer, neurodegenerative diseases, and cardio- 
vascular, metabolic, and infectious diseases. Defects in autophagy-linked genes 
are commonly found in cancer that leads to tumor progression and therapeutic 
resistance. Autophagy-associated genes act as both tumor suppressors and promot- 
ers, and their aberrant expression is associated with the dysregulation of cellular 
homeostasis and initiation of tumor growth. In addition to genetic and epigenetic 
defects, the role of microRNAs associated with autophagy is vital in cancer devel- 
opment. However, the mechanisms associated with such epigenetic alterations are 
largely unknown though our understanding in this area is continuously growing. 
Based on different genetic and epigenetic backgrounds, the influence of autophagy 
on the fate of cancer cells can vary between different cancers or among the cells of 
the same cancer. Keeping in view of the paradoxical role of autophagy in cancer 
development and progression, careful designing of the autophagy inhibitors or 
promoters can increase the therapeutic efficacy of anticancer agents. Thus, an 
increase understanding with respect to the functional aspects of autophagy is 
important in various cancers in order to exploit it for therapeutic advantage. 
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Cancer Genomics and Precision Medicine: 
A Way Toward Early Diagnosis 
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2.1 Introduction 


Since the first draft of the Human Genome Project was completed in April 2003, 
biomedical researchers have been mining and extrapolating genomic data toward 
the goal of improving human health and realizing medical benefits. The promise of 
“personalized oncomedicine,” the matching of therapeutics to appropriate molecu- 
lar targets in individual cancer patients, lies in the convergence of cancer research- 
ers, computational biologist, and clinicians to identify the driving mutations 
involved in tumor progression and metastasis and pursue appropriate therapies. The 
virtual concept of “cancer genome” in the development of uncontrolled cell growth 
was conceived as early as late nineteenth and early twentieth century by Theodor 
Boveri (Boveri 2008). Boveri hypothesized that malignant tumors could be the 
result of a certain abnormal condition of the chromosomes arising from multipolar 
mitosis. Several decades later the discovery of the Philadelphia chromosome as the 
genetic driver of chronic myeloid leukemia (CML) provided the experimental evi- 
dence for Boveri’s hypothesis (Nowell and Hungerford 1961). 

The first description of the translocation between chromosomes 9 and 22 in the 
Philadelphia chromosome was reported by Janet D. Rowley in 1980 (Rowley 1980); 
however, it was another 10 years before the genes involved in the rearrangement 
were identified as breakpoint cluster region (BCR; chromosome 22) and v-abl 
Abelson murine leukemia viral oncogene homolog (ABL; chromosome 9) (Groffen 
et al. 1984). BCR-ABL fusion protein was demonstrated to function as a constitu- 
tively activated tyrosine kinase that stimulated proliferation of myeloid cells, lead- 
ing to the development of CML (Lugo et al. 1990). Subsequently, a new therapeutic 
agent, imatinib mesylate (Gleevec), was developed that targets the kinase domain of 
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the fusion protein which essentially reversed the high mortality rate of CML patients 
(Druker et al. 1996). 

The discovery of the Philadelphia chromosome, the identification of BCR-ABL 
fusion protein, and the development of Gleevec targeting the oncoprotein are a clas- 
sic early example of personalized medicine. While the identification of BCR-ABL 
fusion to the development of Gleevec in CML represents a linear path from basic 
molecular discovery to medical success, most cancers are far more complex. Unlike 
CML (and most hematological cancers) where there is only a single causative 
genetic lesion, most solid tumors are highly heterogeneous and many harbor private 
mutation(s). However, over the last decade or so the field of medical oncology has 
experienced several remarkable breakthroughs. The amplification of the HER2/neu 
gene that was identified in ~20% of the breast cancer patients (Schechter et al. 1984; 
Slamon et al. 1987) led to the development of a monoclonal antibody, trastuzumab 
(Herceptin; Genentech), to treat HER2-positive breast cancer women (Robertson 
1998); and lung cancer patients that harbor specific EGFR mutations were found to 
respond to gefitinib (Iressa) and erlotinib (Tarceva) that target these mutations 
(Lynch et al. 2004; Pao et al. 2004). PARPI inhibitor, olaparib, demonstrated 
antitumor activity in cancers associated with specific BRCA/ or BRCA2 mutations 
that impair the DNA repair pathway (ClinicalTrials.gov number, NCT005 16373) 
(Fong et al. 2009). 

The recent advances in high-throughput sequencing technologies that now 
make clinical sequencing economically feasible, combined with advanced com- 
putational approaches and higher-resolution analyses, have allowed us to obtain 
an unprecedented view of the genomic landscape of mutations/aberrations in 
individual cancer patients. Integrative sequencing strategies including whole- 
genome sequencing, targeted whole-exome sequencing, transcriptome sequenc- 
ing (RNA-Seq), and shallow (5X—15X) paired-end whole-genome sequencing 
can be applied to uncover clinically significant genetic alterations in tumor speci- 
mens of patients and identify targets for existing therapies or direct patients to 
appropriate clinical trials. 





2.2 ‘Integrative Sequencing Strategy 


Cancer arises from various genetic/molecular alterations including nucleic acid 
substitutions, gene fusions and rearrangements, amplifications and deletions, and 
a host of other aberrations that perturb gene expression (Stratton et al. 2009). 
Although the cancer genome can harbor multiple mutations, only few are “driv- 
ers” that confer clonal growth advantage, are positively selected, and are causally 
implicated in cancer development in a background of “passenger” mutations. 
Tumor specimens are often admixtures with varying fractions of normal tissue, 
or they may contain tumor subclones; therefore high sequencing depth is required 
for the detection of variants (Fig. 2.1). While whole-genome sequencing could be 
employed to identify copy number alterations (CNAs) and structural rearrange- 
ments at relatively shallow depth (Stephens et al. 2009), accurate identification of 
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Fig. 2.1 Integrative sequencing of tumors for personalized oncology. Schema representing inte- 
gration of whole-genome sequencing (green), whole-exome sequencing (red) for 1-2% of the 
genome, transcriptome or mRNA sequencing (purple), and targeted exome-capture sequencing 
(blue). Each sequencing strategy can be integrated for detecting genetic aberrations in cancer 
tissues including structural variants, CNVs, splice variants, point mutations, and gene 
expression 


point mutations requires greater coverage and depth (Meyerson et al. 2010). 
Somatic mutations distinct from inherited DNA variants are identified by filter- 
ing out commonly inherited variants in human populations (>5% allele fre- 
quency) that have been registered in databases (Stratton et al. 2009); however, 
some rare inherited single nucleotide polymorphisms (SNPs) and structural vari- 
ants may not be registered. Somatic mutations that are highly represented among 
cancer genes include protein kinases families in various signaling pathways; 
MAPK/ERK pathway is an example where upstream mutations are found in cell 
membrane-bound receptor tyrosine kinases such as EGFR, ERBB2, FGFR1, 
FGFR2, FGFR3, PDGFRA, and PDGFRB as well as in the downstream cytoplas- 
mic components NF1, PTPN1I1, HRAS, KRAS, NRAS, and BRAF (Johnson and 
Lapadat 2002). 

The International Cancer Genome Consortium (ICGC: http://www.icgc.org/home ) 
undertook the task of comprehensively characterizing somatically acquired genetic 
alterations in at least 50 different classes of cancer, including those with the highest 
global incidence and mortality (Stratton et al. 2009). Currently, the ICGC has 
received commitments from various funding organizations in Asia, Australia, 
Europe, North America, and South America for 74 project teams in 17 jurisdictions 
to study over 25,000 tumor genomes that include cancers of the biliary tract, bladder, 
blood, bone, brain, breast, cervix, colon, eye, head and neck, kidney, liver, lung, 
nasopharynx, oral cavity, ovary, pancreas, prostate, rectum, skin, soft tissues, stom- 
ach, thyroid, and uterus. The genomic data generated by the participating ICGC 
members listed in Table 2.1 are made available by the Data Coordination Center 
through the ICGC website ( www.icgc.org ). 
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Table 2.1 List of the participating International Cancer Genome Consortium (ICGC) members 
for generating genomic data from multiple cancer types 









































ICGC members Analyzed tumors 

Australia Ovarian and pancreatic cancer 

Canada Pancreatic, pediatric brain, and prostate cancer 

China Bladder, esophageal, gastric, and renal cancer 

European Union/ Renal cancer 

France 

France Liver cancer 

Germany Blood, brain, and prostate cancer 

India Oral cancer 

Japan Liver cancer 

Saudi Arabia Thyroid cancer 

South Korea Blood and lung cancer 

Spain Blood cancer 

UK Blood, bone, breast, esophageal, lung, prostate, and skin cancer 

USA Bladder, blood, brain, breast, cervical, colon, gastric, head and neck, 
liver, lung, ovarian, pancreatic, prostate, rectal, renal, skin, thyroid, 
and uterine cancer 





2.3. Whole-Genome Sequencing 


Whole-genome sequencing provides comprehensive characterization of somatic 
and germline mutations in a specimen. The most commonly used methods to make 
single nucleotide variant calls are (1) comparison with other sequenced genomes 
via Single Nucleotide Polymorphism database (dbSNP) and other resources for 
variant discovery such as the 1000 Genomes Project (www. 1000genomes.org) and 
(2) critical assessment of remaining variant sites by comparison of tumor and 
matched normal genome. This approach also takes into consideration two primary 
measures to distinguish high- from low-quality variants (Mardis and Wilson 2009): 
first, a cumulative base-calling quality value that is summed from the individual 
quality values of each base identifying the putative variant (assigned by the 
Illumina’s analysis pipeline known as Consensus Assessment of Sequence and 
Variation—CASAVA software) and second, a mapping quality value assigned by 
MAQ (Mapping and Assembly with Quality) assessing the genome-wide unique- 
ness of each aligned read (Li et al. 2008). CASAVA enables genomic builds, SNP 
calls, insertions/deletions (indels) detection, and count reads from the data gener- 
ated from one or more runs across a broad range of sequencing applications 
(Table 2.2). Additionally, MAQ enables both read mapping and genotype calling 
from simulated and real data by utilizing mate-pair information and estimates the 
error probability of each read alignment. Recently, whole-genome sequencing of 
bladder cancers at a median depth of ~80X revealed recurrent protein-inactivating 
mutations in CDKN/JA and FATI. Moreover, the Stampy and Platypus programs 
have been used for mapping and aligning and somatic base substitution/single 
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Table 2.2 Key terminologies commonly used in next-generation sequencing 





Key terminologies | Definition 








Number of reads_| Total amount of sequence data output by the instrument 














Coverage The average number of reads that align to or “cover” known reference bases 
Sequencing The total number of bases sequenced and aligned at a given reference base 
depth position 

Read length Number of base pairs of a given read 

Error rate Overall error rates are calculated by dividing the total number of errors by 


the number of known bases in the reference genome 


Paired-end reads_ | A technology that obtains sequence reads from both ends of a DNA 
fragment template to generate high-quality, alignable sequence data. 
Paired-end sequencing facilitates detection of genomic rearrangements and 
repetitive sequence elements, as well as gene fusions and novel transcripts 





Mate-pair reads | Mate-pair sequencing is similar to paired-end sequencing; however, the size 
of the DNA fragments used as sequencing templates is much longer 
(1000-10,000 bp). Mate-pair methods are particularly valuable for joining 
contigs in de novo sequencing and for detecting translocations and large 
deletions (structural variants) 





Multiplexing Processing of a large number of samples on a high-throughput instrument. 
Sample multiplexing is a useful technique when targeting specific genomic 
regions or working with smaller genomes 








nucleotide variant (SNV) calls with high confidence ranging from 27,490 to 121,016 
(Cazier et al. 2014). A recent study that employed exome sequencing of 50 lethal, 
heavily treated metastatic castration-resistant prostate cancers (CRPCs) demon- 
strated recurrent (8.6%) mutations in multiple chromatin- and histone-modifying 
genes such as MLL2 and the AR collaborating factor FOXA/ (3.4%) (Grasso et al. 
2012). Nevertheless, evaluating somatic mutations in cancer specimens is often 
challenging for samples with very low tumor content (percentage of tumor cells in 
a given specimen) and can limit analysis as the differential variants between tumor 
and normal sample that could be detected would be very low. 











2.4  =Transcriptome Sequencing 


Transcriptome sequencing (RNA-Seq) provides a comprehensive landscape of the 
expressed genome that includes all unique RNA transcripts from both coding and 
noncoding regions. In addition to the sequence and identity of RNA species in a 
sample, RNA-Seq can identify genomic rearrangements, copy number variations 
(CNVs), focused indels, and single nucleotide mutations. Importantly, transcrip- 
tome sequencing can also provide gene expression data with more sensitivity than 
microarray experiments (Marioni et al. 2008; Sultan et al. 2008). Utilizing RNA- 
Seq, Maher et al. discovered novel fusion transcripts by employing single-end reads 
of various lengths. This approach nominated multiple candidates or chimeras such 
as SLC45A3-ELK4 that were independently confirmed as a common “read-through” 
transcript identified in prostate cancer (Maher et al. 2009). A combination of two 
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next-generation sequencing platforms was utilized, and the data were integrated to 
identify fusion transcripts from cancer cell lines; first long reads from RNA-Seq 
data that partially aligned to the reference genome (Roche 454) were identified as 
putative fusion transcripts, and then short reads that spanned fusion junctions 
obtained from a second RNA-Seq dataset (IIlumina Genome Analyzer) were inte- 
grated with the first dataset to nominate candidate gene fusions. Using this approach, 
Maher et al. successfully “rediscovered” previously known and novel fusion tran- 
scripts in the prostate cancer cell lines LNCaP and VCaP and various prostate tumor 
samples (Maher et al. 2009). Recently, Kalyana-Sundaram et al. developed a bioin- 
formatics pipeline for explicitly detecting pseudogene transcripts from RNA-Seq 
data and demonstrated genome-wide expression of pseudogenes, which are ubiqui- 
tously expressed in a lineage and/or cancer-specific manner (Kalyana-Sundaram 
et al. 2012). Briefly, they discovered breast-specific unprocessed pseudogene 
ATP8A2¥, which possibly arouses from the duplication of wild-type ATP8A2, 
therefore likely harbors similar promoter elements. Similarly, a prostate cancer- 
specific pseudogene, CKADR-Y, was also revealed using the same bioinformatics 
framework. CXADR-Y is a processed pseudogene located on chromosome 15, and 
parental CXADR protein demonstrates putative tumor suppressor functions, and its 
loss has been implicated in w-catenin downregulation (Pong et al. 2003). 

Chromosomal rearrangements leading to generation of gene fusions represent 
one of the common mechanisms for the expression of oncogenes in epithelial can- 
cers (Chinnaiyan and Palanisamy 2010). Oncogenic genetic rearrangements were 
initially thought to be confined to hematological cancers (Mitelman 2000; Mitelman 
et al. 2007). In 2005, Chinnaiyan and colleagues reported recurrent gene fusions 
between the transmembrane protease serine 2 (TMPRSS2) gene and members of the 
ETS family of transcription factors, predominantly ERG (v-ets erythroblastosis 
virus E26 oncogene homolog (avian)), in prostate cancer, representing the first dis- 
covery of a gene fusion in a solid tumor (Tomlins et al. 2005). Subsequently, various 
gene fusions in a variety of cancers including breast and lung were discovered 
(Stephens et al. 2009; Martelli et al. 2009; Natrajan et al. 2014). Gene rearrange- 
ments of SLC45A3-BRAF (solute carrier family 45, member 3—v-raf murine sar- 
coma viral oncogene homolog B1) and ESRP1-RAF1 (epithelial splicing regulatory 
protein- 1—v-raf-1 murine leukemia viral oncogene homolog-1) were discovered in 
prostate cancer by employing paired-end massively parallel transcriptome 
(Palanisamy et al. 2010). Importantly, these fusions are potentially “druggable.” 
Furthermore, identification of these RAF pathway gene rearrangements in a variety 
of cancer types—prostate and gastric cancers and melanoma—supports the notion 
that cancers should be stratified by the driving molecular alterations/genetic events 
rather than by organ site. 

Using whole-exome and transcriptome sequencing, a genetic rearrangement 
between transcriptional repressor NAB2 and the transcriptional activator STAT6 was 
detected in all solitary fibrous tumors (SFT)/hemangiopericytoma cases tested, 
establishing NAB2-STATO as the causative mutation of SFT (Robinson et al. 2013a). 
In addition to driving mutations, clinical sequencing can also uncover mechanisms 
of treatment resistance and disease progression. Recently, ER-positive, metastatic 
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breast cancer patients underwent sequence analysis that revealed mutations in the 
ligand-binding domain (LBD) of the estrogen receptor (ESR1) that resulted in con- 
stitutive activity and continued responsiveness to anti-estrogen therapies in vitro. 
These results suggest that activating mutations in ESR1 are a key mechanism in 
acquired endocrine resistance in breast cancer therapy (Robinson et al. 2013b). 
Moreover, recently identified novel variants of CDK4, LARP], ADD3, and PHLPP2 
in breast cancer are adding to the repertoire of the cancer transcriptome as well as 
uncovering novel therapeutic targets (Eswaran et al. 2013). Transcriptome analysis 
of various cancers also identified recurrent novel fusion involving kinase receptors 
that can potentially serve as promising drug targets (Stransky et al. 2014). 

A distinct advantage of unbiased transcriptome sequencing is the ability to study 
noncoding RNA species whose role in cellular processes and disease state is becom- 
ing increasingly appreciated. Long noncoding RNAs (IncRNAs) play a role in nor- 
mal cellular processes and are also implicated in cancer progression and metastasis 
(Crea et al. 2014). Noncoding RNAs (ncRNAs) can be categorized into small (under 
200 nucleotides) and large ncRNAs. The small ncRNAs include small nucleolar 
RNAs (snoRNAs), PIWI]-interacting RNAs (piRNAs), small interfering RNAs (siR- 
NAs), and microRNAs (miRNAs) (Amaral et al. 2008). Earlier, the ncRNA known 
as HOTAIR was found to be aberrantly over-expressed in advanced breast and 
colorectal cancer, and repressing HOTAIR expression in cancer cells attenuated the 
invasive potential of the cancer cells (Kogo et al. 2011; Wang and Chang 2011). 
Thus, RNA-Seq analysis is a powerful tool for understanding the transcriptome 
landscape of cancers and to molecularly stratify subsets of cancer by mutational 
classes of genetic aberrations. Further, the resulting datasets from various RNA-Seq 
methodologies can provide a wide range of information such as gene expression, 
methylation status, histone modifications, and genomic occupancy of transcription 
factors and other regulatory protein-binding positions. 





2.5 Methylated DNA Immunoprecipitation (MeDIP) 
Sequencing 


The term “epigenetics” was originally coined by Conrad Waddington to describe 
heritable changes in a cellular phenotype that were independent of alterations in the 
DNA sequence. All epigenetic changes such as chromatin remodeling, histone mod- 
ifications, and DNA methylation are highly regulated by a group of chromatin- 
modifying enzymes. There are at least four known DNA modifications (Baylin and 
Jones 2011; Wu et al. 2011) and 16 classes of histone modifications (Kouzarides 
2007; Tan et al. 2011). Initially, MethylC-seq, a bisulfite conversion method, was 
developed to analyze the methylome at single-base resolution (Cokus et al. 2008). 
In this approach, sodium bisulphite converts unmethylated cytosines to uracils leav- 
ing 5-methylated cytosines unchanged, and upon amplification by polymerase chain 
reaction (PCR), unmethylated cytosines appear as thymines and methylated cyto- 
sines appear as cytosines (Frommer et al. 1992). A combination of next-generation 
sequencing (NGS) platforms with established techniques such as chromatin 
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immunoprecipitation (ChIP-Seq) has yielded an unparalleled view of the epig- 
enome. Importantly, NGS introduced a novel approach to assess genome-wide epi- 
genetic changes in an unbiased manner without the limitations of probe-based 
microarray platforms. One of the most prevalent epigenetic alterations in cancer is 
the methylation changes that occur within CpG islands that are present in 70% of all 
mammalian promoters. CpG island methylation plays a critical role in transcrip- 
tional regulation, and it is commonly altered during malignant transformation 
(Baylin and Jones 2011). Genome-wide mapping of CpG methylation using NGS 
platforms has confirmed that ~5—10% of normally unmethylated CpG promoter 
islands become abnormally methylated in various cancer genomes (Ateeq et al. 
2008; Szyf 2005). Moreover, CpG hypermethylation of promoters not only affects 
the expression of protein-coding genes but also the expression of various noncoding 
RNAs, some of which have a role in oncogenesis (Baylin and Jones 2011). 

Kim et al. employed a novel deep-sequencing technique named Methy]Plex to 
enrich for methylated regions of the genome to characterize the DNA methylome 
map of prostate cancer cells and tissues requiring minimal DNA input. Massively 
parallel sequencing of the enriched products identified differentially methylated 
regions (DMRs) and revealed novel insights regarding the genomic loci and func- 
tional consequences of DNA methylation in cancer (Kim et al. 2011). This study 
uncovered 6691 methylated promoters in prostate tissues, 2481 cancer-specific 
DMRs including several novel DMRs such as WFDC2 promoter that displayed 
increased levels of methylation in cancer tissues compared to benign tissues and 
normal prostate epithelial cells. 

Whole-genome sequencing in a variety of cancers has identified recurrent somatic 
mutations in numerous epigenetic regulators as well (Forbes et al. 2011; Stratton 
et al. 2009). Targeted NGS resequencing of cancer genomes found mutations within 
EZH2 (enhancer of zeste 2 polycomb repressive complex 2), the catalytic subunit of 
polycomb repressive complex 2 that is over-expressed in multiple cancers (Cao et al. 
2008; Li et al. 2009; Varambally et al. 2008), including lymphoid and myeloid can- 
cers (Khan et al. 2013; Yoshida et al. 2013). Moreover, heterozygous missense muta- 
tions resulting in the substitution of tyrosine 641 (Y641) within the SET domain of 
EZH2 were observed in 22% of patients with diffuse large B-cell lymphoma (Morin 
et al. 2010). Recurrent mutations in the histone methyltransferase, MLL2, have been 
discovered in ~90% of follicular lymphoma patients (Morin et al. 2011). Similarly, 
mutations in UTX, a histone demethylase, were observed in up to 12 histologically 
distinct cancers (van Haaften et al. 2009). Given these findings, several new drugs 
against these epigenetic targets are in development (Arrowsmith et al. 2012). 





Conclusions 

The current genomics era holds tremendous promise for “personalized oncome- 
dicine.” The recent rapid advances in high-throughput sequencing technologies 
and the field of cancer genomics is catalyzing the discovery of novel “druggable” 
molecular targets that include oncogenes, protein pathways involved in signaling 
cascade, and networks shown to be involved in the pathogenesis of cancer as well 
as the development of therapies against these targets. In the not too distant future, 


2 Cancer Genomics and Precision Medicine: A Way Toward Early Diagnosis 39 


clinical sequencing of patient tumor specimens to inform therapeutic interven- 
tion is likely to be adopted as standard clinical practice. 
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Genetics of Liver Diseases 
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... Liver disease is best explored during the time when genes 
involved in adaptation to life and disease are given their first 
“test drive” —childhood... 

Saul J Karpen, Hepatology 2008 Aug;48(2):353—4 


Pathophysiology of liver disorders involves an interaction between environmental 
genetic and host factors. This includes childhood liver diseases like familial chole- 
static syndromes as well as liver diseases manifesting in adulthood like alcoholic 
liver disease, non-alcoholic liver diseases, viral hepatitis, etc. Genetic predisposi- 
tion can be due to single-gene mutations (as in alpha-1 antitrypsin deficiency), sus- 
ceptible single-nucleotide polymorphisms in intrahepatic cholestasis of pregnancy, 
or modifier genes in drug-induced liver disease and histocompatibility leucocyte 
antigen (HLA) association in complex diseases as autoimmune hepatitis (AH). 
Understanding the genetic contribution to liver diseases is important for clinicians 
as the same disorder may have wide phenotypic variability while this is vital for 
research workers due to therapeutic implications. 
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Table 3.1 Liver disorders 
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Age at common liver diseases genetic contribution presentation 





















































Age Disease Genetic mutations 
Neonate/infants | Famlial Alagille syndrome JAG 1 (jagged 1) mutation 
cholestatic notch 2 mutation 
syndrome PFIC1 ATP8B1 mutation 
PFIC2 ABCB11 mutation 
PFIC3 ABCB4 mutation 
Metabolic Alpha-1 antitrypsin deficiency | SERPINA 1 gene mutation 
conditions Cystic fibrosis CFTR gene mutation 
3D structural | Cystic liver disease/ductal Fibrocystin gene mutation 
plate malformation 
Childhood Wilson’s disease ATP7B mutation 
Adults Hereditary 
Haemochromatosis 
Alcoholic liver disease 
Viral hepatitis 
NAFD 
3.1 Liver Diseases in Neonate/Infants 


The liver disorders are often categorised by their age at presentation-neonatal 
period/infancy, childhood or adulthood (Table 3.1). 


3.1.1 Familial Cholestatic Syndromes 

Alagille syndrome (ALGS)—ALGS is autosomal dominant disorder with variable 
gene mutation expression affecting cellular differentiation as well as tissue develop- 
ment (Dhorne-Pollet et al. 1994). Predominantly (approx 95%) this disorder is caused 
due to mutations in JAG1 Gagged 1) gene (Li et al. 1997). JAG1 is a ligand for notch. 
JAG1 as a ligand binds to transmembrane notch receptors and initiates the notch 
signalling pathway responsible for determining cell fate and normal organogenesis 
(Artavanis-Tsakonas et al. 1999). Notch signalling has an important role in skeletal 
development, intrahepatic bile ducts as well as cardiac development and ductal plate 
remodelling. Mutations result in premature termination of proteins with curtailed and 
inactive proteins to decrease in the amount of normal protein. This phenomenon of 
haploinsufficiency is the postulated mechanism for causing AGLS. These mutations 
are distributed across the entire coding region of the JAG/ gene with more than 430 
mutations identified so far. Mutations from an affected parent are seen in 30-50% 
while many mutations (50-70%) are sporadic without obvious genotype-phenotype 
correlation (Crosnier et al. 2000). These commonly include protein-truncating frame- 
shift and nonsense mutation (69%), splicing mutations (16%), missense mutation 
(11%) and deletions (4%) (Spinner et al. 2001) (Table 3.2). 
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Table 3.2 Alagille syndrome-related genes 























Gene mutations 
Chromosomal No. of | Total proportion in 
Gene name _ | locus Protein name* exons | mutations’ | syndrome® 
JAG1 20p12.1-p11.23 | Protein jagged 1 26 474 94-96% 
notch 2 1p13-p11 Neurogenic locus 34 44 1-2% 
notch homolog 
protein 2 (notch 2) 
*Uniprot 
*‘HGMD 
“NCBI gene review 


Table 3.3 PFIC-associated genes 





PFIC type | Gene (protein) GGT level | Pathophysiology | Clinical considerations 

Type 1 ATP8BI(FIC1) | Low/normal | Decreased bile PFIC1 Liver failure 
acid transport patients | before adulthood, 
diarrhoea, hearing 
loss 
BRIC Episodic pruritus 
patients | and jaundice 














Type 2 ABCB11 Low/normal | Mutation in bile | Risk of hepatobiliary 
(BSEP) acid export malignancies 
pump (BSEP) 
Type 3 ABCB4(MDR3) | High Destabilised Cholestasis 
micelles 

















The main clinical characteristics of Alagille syndrome include: (1) cholestasis, 
due to lack of intrahepatic bile ducts which presents in the neonatal period; (2) char- 
acteristic facies; (3) skeletal abnormalities, butterfly thoracic vertebrae; (4) eye 
abnormalities, posterior embryotoxon; and (5) cardiac abnormalities like peripheral 
pulmonic stenosis. 


3.1.2 Progressive Familial Intrahepatic Cholestasis (PFIC) 


Formerly known as Byler’s disease, this is the prototype of bile canalicular transport 
defects. PFIC is characterised by low or normal level of gamma-glutamyl] transpep- 
tidase (GGT) in PFIC type | or 2 and high GGT in PFIC type 3. The main clinical 
manifestations are pruritus and cholestasis presenting in early infancy but may have 
variable phenotype with presentation across all age groups (Table 3.3). 


3.1.2.1 PFIC Type 1 (FIC1 Disease) 

This condition results from an abnormal regulation of bile acid homeostasis due to 
recessive mutations in ATP8B1 gene on chromosome 18q21. FIC! disorder is 
characterised by persistent or repeated jaundice (BRIC, benign recurrent intrahe- 
patic cholestasis) with low GGT disproportionate to the level of cholestasis. 
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In PFIC 1 disorder, generally liver disease progresses to liver failure before adult- 
hood, while BRIC patients are on the mild end of the spectrum with recurrent 
episodic pruritus and jaundice but preserved liver function. Extrahepatic manifes- 
tations are characteristically seen in PFIC | which includes diarrhoea, poor growth, 
pancreatic insufficiency and hearing loss. 

Genetic linkage analysis in patients with FIC] and BRIC has helped in mapping 
of the responsible gene to chromosome 18q21 (Carlton et al. 1995) and showed that 
defect in P-type ATPase of FIC1 gene results in the disease (Fig. 3.1). ATP8B1 gene 
consists 28 exons in the span over 77 kb of genomic sequence and is expressed in 
many tissues, mostly in the pancreas, intestine and liver, thus explaining the extra- 
hepatic manifestations (Bull et al. 1999). The exact pathophysiology has not been 
identified though it is known that FIC1 is an amino phospholipid transporter and 
helps to maintain the asymmetric distribution of amino phospholipids between the 
inner and outer leaflets of plasma membrane. 





Fig. 3.1 Schematic representation of FIC1 protein topology. FIC1 is an integral component of the 
membrane with ten predicted transmembrane domains (red) and both the amino terminus (N) and 
the carboxy terminus (C) extending into the cytoplasm. The consensus domains for all classes of 
P-type ATPases are depicted in yellow. Consensus sequences for the phosphorylation site (PH), in 
which the aspartate residue is depicted in blue (D), the ATP-binding domains (AB) and the Hinge 
(HI) domain are located. In orange circles, three disease-associated missense mutation sites are 
depicted 
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The inheritance of FIC1 disease is compounded, with several mutations identi- 
fied in the ATPSB/ gene, which may partially explain the marked clinical differ- 
ences in FIC1 disease (Klomp et al. 2000). Mutational analysis of families with 
FIC1 and BRIC have shown 54 disease specific mutations with predominant mis- 
sense mutations in 24 while others included splice site, nonsense or frameshift 
mutations (Klomp et al. 2004). Specific mutations may be responsible for the dis- 
ease in genetically isolated populations like the Amish or the Greenland specific 
populations where specific missense mutations (G308V and DS54N, respectively; 
Fig. 3.1) were identified (Klomp et al. 2000). There is some genotype-phenotype 
correlation as missense mutations were seen more commonly in BRIC1 (58%) as 
compared to PFIC1 (38%) while large deletion, nonsense and frameshift mutations 
are more common with severe phenotype PFIC1 (Klomp et al. 2004). One missense 
mutation (1661T; Fig. 3.1) seen in BRIC patients of Western European origin has 
shown a dramatic variable phenotype. These patients have shown marked variability 
in age of onset as well as in the frequency and duration of the cholestatic episodes 
(Klomp et al. 2004). 


3.1.2.2 PFIC Type 2 (BSEP, Bile Salt Export Pump Defect) 

PFIC2 is characterised by defective bile salt excretion across hepatocyte canalicular 
membrane due to the defect in bile salt export pump. PFIC2 is caused due to muta- 
tions in ABCB11 gene, which encodes the ATP-dependent canalicular bile salt 
export pump. It is found that ATP-binding cassette (ABC) family of proteins is 
important for the normal enterohepatic circulation of bile salts and bile salt- 
dependent bile flow (Fig. 3.2). PFIC2 usually presents as cholestasis in early infancy 
with rapid progression to cirrhosis in childhood. Pruritus is the predominant clinical 
symptom disproportionate to the level of jaundice while extrahepatic manifestations 
are absent (in contrast to PFIC1). Occurrence of malignancy (hepatocellular 
carcinoma) has been noted as early as 10 months of age. 


Normal _ PFIC Type 2 





Fig. 3.2 Schematic representation of progressive familial intrahepatic cholestasis (PFIC) type 2. 
Left panel: under normal conditions bile salt export pump (BSEP) transports bile acids into bile. 
Right panel: mutation of the ABCB11/BSEP gene results in proteasomal degradation and/or 
expression of a protein with low/no function, leading to reduced transport of bile acids into bile and 
consequent accumulation of deleterious bile acids 
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Linkage analysis and homozygosity mapping in Middle Eastern families with 
low GGT cholestasis first identified this locus on chromosome 2q24 (Strautnieks 
et al. 1997). Several mutations have been identified in PFIC2, while some mutations 
have also been seen in BRIC, in drug-induced cholestasis and intrahepatic cholesta- 
sis during pregnancy. ABCB11 gene has 28 exons and, so far, about 100 mutations 
have been identified. Missense mutations are seen in majority, while others include 
splice site, deletion and insertion as well as nonsense mutations. Majority of chil- 
dren with BSEP mutations showed no canalicular BSEP protein expression (Jansen 
et al. 1999). Disease phenotype are often associated with mutations resulting in 
premature protein truncation or failure of protein production with little or no detect- 
able canalicular BSEP expression and a higher risk of developing hepatocellular 
carcinoma. Missense mutations result in impaired BSEP protein processing and 
trafficking in the endoplasmic reticulum (i.e., p.E297G, p.D482G) or disrupt func- 
tional domains and protein structure. Moreover, mutations like p.N490D, p.G562D, 
p-R832C and p.A1110E showed detectable BSEP expression but with functional 
deficiency (Hayashi et al. 2005). However, in milder disease, such as BRIC2, mis- 
sense mutations predominate over those leading to failure of protein production 
(van Mil et al. 2004). 


3.1.2.3 PFIC Type 3 (PFIC3) or MDR3 Deficiency 
(Multidrug Resistance Protein 3 Deficiency) 

PFIC3 disorder is an autosomal recessive, with similar features to PFIC1 and PFIC2 
but with high GGT levels. PFIC3 is caused due to mutations in ABCB4 (previously 
called MDR3) located on chromosome 7q21. ABCB4 is a phospholipid translocator 
involved in biliary phospholipid (phosphatidylcholine) excretion and is expressed in 
the canalicular membrane of the hepatocyte (Jacquemin 2001). Cholestasis results 
from the toxicity of bile in which the bile salts are not solubilised due to absence of 
biliary phospholipids, leading to bile canaliculi and biliary epithelium injuries 
(Jacquemin et al. 2001). 

The phenotypic continuum of PFIC3 spans from neonatal cholestasis to cirrhosis 
in young adults. In one large series of 31 patients, ABCB4 sequence analysis 
revealed around 17 different mutations with 11 missense mutations and 6 mutations 
predicting a truncated protein (Jacquemin et al. 2001). Homozygous mutations in 
patients causing a truncated protein resulted in absence of canalicular MDR3 pro- 
tein. MDR3 protein deficiency is seen when the truncated protein is broken down 
very rapidly after synthesis giving rise to extremely low steady levels of the protein. 
More likely, the premature stop codon may lead to instability and decay of the 
ABCB4 mRNA. The absence of ABCB4 mRNA is observed in several liver disease 
patients (de Vree et al. 1998). Missense mutations also lead to intracellular mispro- 
cessing of MDR3 and result in some residual MDR3 function with milder disease 
(Delaunay et al. 2009) (Fig. 3.3). 

Therefore, it is now evidenced that in addition to PFIC3, an MDR3 defect can be 
involved in intrahepatic cholestasis of pregnancy (ICP3) (Dixon et al. 2000), choles- 
terol gallstone disease and drug-induced cholestasis (Rosmorduc et al. 2003; Lang 
et al. 2007). 
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Fig. 3.3 Progressive familial intrahepatic cholestasis (PFIC): types, related genes and transport 
defects. BA bile acid, PC phosphatidylcholine 


3.1.3 Cystic Fibrosis (CF) 


Cystic fibrosis is also an autosomal recessive disease categorised as abnormal epi- 
thelial electrolyte transport with elevated sweat chloride concentrations, chronic 
lung disease and pancreatic dysfunctions. It is frequent potentially fatal genetic dis- 
order in the Caucasians but is rarely seen in the Indian subcontinent. There are more 
than 2000 mutations identified in the cystic fibrosis transmembrane conductance 
regulator (CFTR) gene (CF Mutation Pubmed Database). CFTR causes neonatal 
cholestasis, steatosis, nodular or multilobular cirrhosis and biliary tract complica- 
tions. CFTR gene encodes for a membrane-channel protein and is located on the 
long arm of chromosome 7. CFTR acts as a cAMP-dependent chloride channel in 
the apical membrane of most secretory epithelia, including biliary epithelial cells, 
or cholangiocytes. In cholangiocytes, CFTR has an important role in biliary secre- 
tion and bile flow. 

The CFTR gene has 27 exons and 1480 amino acid membrane-bound glycopro- 
teins, which is associated with ATP-binding cassette (ABC) superfamily proteins. 
CFTR protein contains two six-span membrane-bound regions each connected to a 
nuclear binding factor bound to ATP. A unique feature of CFTR is that in between 
these two units is an R-domain (Fig. 3.4). 

The most common CFTR mutation is delta F508, (66%) widespread in all eth- 
nicity worldwide. CFTR gene mutations are classified into five classes based on 
their effect on the CFTR protein. Class J mutations cause impairment of CFTR mes- 
senger RNA production (G542X); Class II (delta F508) mutations result in 
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Fig. 3.4 CFTR domains 


defective processing or trafficking of CFTR protein to the apical membrane; 
Class ITI mutation (G551D) is associated with defective regulation of CFTR, result- 
ing in lack of response to cAMP agonists; Class IV mutations (R117H) demonstrate 
residual calcium channel conductance; and Class V mutations lead to abnormal 
splicing of CFTR with a reduction in the number and function of chloride channels 
(Fig. 3.5) (Koch et al. 2001). Class I, II and III mutations cause non-functioning of 
CFTR at plasma membrane and are considered as ‘severe’ mutation. Class [TV and 
V mutations are considered as ‘mild’. There is no correlation between the liver 
disease and genotype, which also indicate an important role of other genetic or 
environmental modifiers. 


3.1.4 Alpha-1 Antitrypsin Deficiency 


Alpha-1-antitrypsin (a-1AT) deficiency is an autosomal codominant disorder char- 
acterised by a mutant a-1 antitrypsin Z protein that fails to be secreted in body fluids 
due to abnormal folding. «-1 AT is a 52 kDa glycoprotein formed in hepatocytes and 
secreted into the blood circulation, where it performs as main protease inhibitor 
specifically for destructing neutrophil protease, elastase, proteinase 3 and cathepsin 
G. The primary physiologic significance of the protein is in the lungs, where it pro- 
tects the alveolar tissue from proteolytic damage by enzymes like neutrophil elas- 
tase and its deficiency results in emphysema in adults. Liver disease is due to the 
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Fig. 3.5 Classes of CFTR mutations. (1) Early stop codons resulting in no CFTR protein. (2) 
Abnormal CFTR trafficking resulting in degradation in the endoplasmic reticulum. (3) Mature 
CFTR protein is refractory to normal activation. (4) CFTR activated normally but with reduction 
in single-channel conductance. (5) Splice site mutations resulting in decreased full length mRNAs 
and a decrease in functional CFTR at apical membrane. ER endoplasmic reticulum 


hepatotoxic effect of the retained mutant protein within the hepatocytes (Fig. 3.6). 
Children usually present with neonatal cholestasis, chronic hepatitis with later 
development of cirrhosis, but pulmonary manifestations are not seen till adulthood. 
This is the most common genetic cause of liver disease in Caucasian children but 
not yet reported in Indian children. 

The gene encoding «-1AT is present on chromosome 14q31—32.2 and is called the 
SERPINAI gene (Carrell and Lomas 2002). More than 100 allelic variants of the 
SERPINAI gene are known including normal variants (M1, M2, M3, etc.), null vari- 
ants and deficiency variants. The «-1AT Z variant is due to single amino acid replace- 
ment at position 342 resulting in abnormal polymerisation of mutant protein within the 
endoplasmic reticulum (ER) and hepatocyte injury mediated via ER stress pathways. 
Hepatocyte nuclear factors (HNFs) especially HNFla and HNF4 are important for 
expression of human SERPINA1 (Kalsheker et al. 2002), and humoral regulation is 
mediated by interleukin 6 (IL-6) and oncostatin M cytokines (Boutten et al. 1998). 


3.1.5 Fibrocystic Hepatorenal Diseases 
The congenital fibrocystic hepatorenal syndromes are autosomal recessive, 


monogenic disorders characterised by multiple defects in the liver and kidneys. 
Liver disease manifests as fibrosis often associated with cysts lined in biliary 
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Fig. 3.6 The Z allele is the most important genetic defect in alpha-1 antitrypsin deficiency. It is a 
single mutation in exon 5 of the gene, leading to substitution of the amino acid glutamine (G) in 
position 342 in the protein for a lysine (A) amino acid. The Z allele results in hepatic polymerisa- 
tion in both hepatocyte inclusions and decreased serum concentration. Therefore, strategies to 
augment the inherited deficiency as well as the development of small peptides that can selectively 
inhibit polymerisation of the Z allele of the AAT protein in the liver are central to therapeutic 
approach 


epithelium leading to intrahepatic biliary tract dilatation. Cystic lesions also 
affect kidneys, and the severity determines the clinical presentation and long- 
term prognosis of these patients. Conditions like Joubert syndrome, Bardet- 
Biedl syndrome present during early infancy, while hereditary hepatic fibrosis 
and Caroli’s disease are associated with autosomal recessive polycystic kidney 
disease (ARPKD), may manifest during childhood. Mutations in polycystic kid- 
ney and hepatic disease (PKHD 1) gene located on chromosome 6p12.3-6p12.2 
are responsible for ARPKD and Caroli’s disease (Gunay-Aygun et al. 2013). 
PKHDI1 encodes for fibrocystin/polyductin (FPC), a type of membrane-associ- 
ated receptor-like protein which is localised to primary cilia. FPC is predomi- 
nantly expressed in the apical domain of renal tubule epithelial cells. This 
protein may play a vital role in renal collecting duct and biliary differentiation 
(Ward et al. 2002). 
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3.1.6 Wilson’s Disease (WD) 


Wilson’s disease is also an autosomal recessive disorder and featured with toxic 
accumulation of copper in the liver, brain and other organs manifested into end- 
stage liver disease and acute liver failure (Tanzi et al. 1993). However, disease also 
has a wide spectrum of asymptomatic individuals. Symptoms are usually seen in the 
first decades of life, with the majority of cases in the ages of 5-35 years. 

ATP7B gene present on chromosome 13 and containing 21 exons is responsible for 
WD (Koch et al. 2001). ATP7B is a copper-transporting P-type adenosine triphospha- 
tase (ATPase) which is required for efficient excretion of the copper into bile. 
Hepatocytes release copper after integration into ceruloplasmin, a glycoprotein that car- 
ries six atoms of copper per molecule. In the absence of ATP7B function, copper is not 
incorporated efficiently resulting in accumulation of copper within the hepatocytes. 

To date, over 500 mutations in ATP7B have been reported. The most common 
mutation is H1069Q, which has been seen in 35-45% European population (Thomas 
et al. 1995). Another mutation, R778L, is widely distributed among Asians (20%) 
(Nanji et al. 1997). In Indian patients, mostly C271 X was found to be common with 
a frequency of 19% of the total mutations (Gupta et al. 2005), and the presence of 
R778L or H1069Q is completely absent. All 21 exons and promoter regions of 
ATP7B can be sequenced, by multiplex polymerase chain reaction (PCR), which 
has become standard practice in clinical molecular diagnosis (Fig. 3.7). 
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Fig. 3.7 Schematic representation of copper metabolism within the hepatocyte 
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3.1.7. Hereditary Haemochromatosis 


Hereditary haemochromatosis (HH) is the most common autosomal recessive disor- 
der mainly reported in Caucasians. HH is characterised by highly abnormal uptake 
of iron from the gastrointestinal tracts, and it manifests initially with symptoms of 
discomforts in joints, general fatigue, low libido and abdominal pain. If HH is left 
untreated, it leads to hypogonadism, cardiomyopathy liver fibrosis and end-stage 
liver diseases like cirrhosis and hepatocellular carcinoma. Therefore, early diagno- 
sis of hereditary haemochromatosis and larger educational programs are required to 
increase people’s awareness about hereditary haemochromatosis. Two major muta- 
tions of the human haemochromatosis gene, C282Y and His63Asp (H63D), cause 
iron overload due to less production of hepcidin. Hepcidin is regulator of iron 
homeostasis in hepatocytes, and insufficient hepcidin expression results to too much 
iron absorption and deposition in tissues, causing multiple organ damage and failure 
(Pelusi et al. 2014; Emanuele et al. 2014; Vujic 2014). However, despite increased 
store of iron in body, enterocyte continue strive to absorb dietary iron. 

There is close association between HLA-A3 and HFE as both are on chromosome 6. 
There is also role of other mutations including TFR2 (transferrin receptor 2), SLC40A1 
(encoding ferroportin), HAMP (encoding hepcidin) and HJV (encoding hemojuvelin), 
A736V variant of TMPRSS6 (regulating hepcidin) in hereditary haemochromatosis. 
However, genetic data exclude role of transferrin, transferrin receptor and ferritin since 
they are not expressed on chromosome 6 (Wang et al. 2013; Roetto et al. 2003; 
Papanikolaou et al. 2004; Panigrahi et al. 2006; Potekhina et al. 2005; Zamani et al. 
2012; Del-Castillo-Rueda et al. 2012; Valenti et al. 2012a) (Table 3.4). 


Table 3.4 Genes associated with iron accumulation 















































Gene Loci Mutations Wild function 
HFE 6p22.2 C282Y, recessive Regulate interaction of transferrin 
H63D, recessive receptor with transferrin 
Arg455GIn 
TFR2 7q22 Glu60Xaa, Encoding transferrin receptor 2 
Met172Lys, 
loss Ala-Val-Ala-Gln at 
594-597 
SLC40A1 2q32 Asn144 His, dominant Encoding ferroportin 
HAMP 19q13.1 Frameshift mutation, Encoding hepcidin 
G71D, recessive 
HJV(HFE2) | 1q21.1 L101P, G320 V, recessive | Encoding hemojuvelin 
TMPRSS6 22q12.3 A736V Encoding matriptase 2 
PNPLA3 22q13.31 | 148M Encoding multifunctional enzyme 
having both triacylgycerol lipase and 
acylglycerol O-acyltransferase activity 
TNF-alpha 6p21.3 —308G >A Encoding multifunctional pro- 
inflammatory cytokine 
BMP2 20p12 rs235756 
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Other than idiopathic haemochromatosis in India and majority of Asians, fre- 
quencies of C282Y mutations are near to zero. Five percent of Russians showed 
homozygosity for C282Y with biochemical and clinical signs of HH (Roetto et al. 
2003). Alcoholic liver disease patients also showed association with C282Y and 
H63D mutations, and mostly are homozygous for the H63D with increased total and 
low-density lipoprotein cholesterol (Raszeja-Wyszomirska et al. 2010). 

Heterozygotic mutations of H63D and TFR2 (transferrin receptor 2) genes were 
found to be more common in Iranian patients; however, these mutations were not 
found to be significantly associated with severity of presentation in HH patients 
(Del-Castillo-Rueda et al. 2012). 

However, in the rat model of hereditary haemochromatosis, sequencing of TFR2 
revealed a novel Ala679Gly polymorphism which is highly conserved residue and, 
therefore, showed the involvement of TFR2 in haemochromatosis (Santos et al. 
2011). In the same way, the importance of A736V and TMPRSS6 polymorphism 
which regulates hepcidin levels was associated with HH and hepatocellular carci- 
noma (Valenti et al. 2012a; Raszeja-Wyszomirska et al. 2010; Santos et al. 2011). 
Many other gene variants are also found to be associated with HH, as 1148M protein 
variant of PNPLA3 and TNF-alpha —308G > A allelic variant was found to be asso- 
ciated with steatosis, fibrosis and cirrhosis in patients with C282Y+/+ hereditary 
haemochromatosis (Bartnikas et al. 2013; Valenti et al. 2012b; Krayenbuehl et al. 
2006). 


3.1.8 Alcoholic Liver Disease 


Alcohol overconsumption for an extended period of time leads to alcoholic liver 
disease (ALD), which is the major cause of chronic liver disease in adults. The 
metabolism of alcohol is known to generate reactive intermediates which contribute 
to cell and tissue damage by altering various cell signalling pathways. In ALD, 
manifestation starts from simple steatosis, fibrosis to cirrhosis and hepatocellular 
carcinoma. Alcohol dependence (AD) is a heritable trait (>50%) and is strongly cor- 
related with susceptibility to excessive alcohol consumption. 

The metabolism of alcohol occurs mainly in the intestine and the liver, where 
alcohol is changed into acetaldehyde by cytochrome P450 2E1 (CYP2E1) and alco- 
hol dehydrogenase (ADH). Acetaldehyde is a very toxic cancer-inducing substance. 
However, most of the acetaldehyde is converted into acetate which is less toxic. 

Polymorphism in alcohol dehydrogenase (ADH) is very important, ADH2*2 
(181229984), ADH3*2 (rs698) and aldehyde dehydrogenase (ALDH), which con- 
verts acetaldehyde to acetate. In human genome, out of 19 ALDH genes, ALDH2 
which is mainly present in the mitochondrial matrix has an important role in alcohol 
metabolism. Candidate gene and linkage studies revealed that the alcohol dehydro- 
genase 1B (ADH1B) and aldehyde dehydrogenase 2 (ALDH2) genes mediate the 
risk for alcoholism in Asians. In many ethnic populations, increase in ADH1B*2 
and ALDH2*1/*1 alleles showed genetic susceptibility for cirrhosis (Li et al. 2012). 
Other important mutations in Asia including India and other continents are 
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CYP2E1*1D, CYP2E1*5 (1s3813867 and rs2031920), TNF-a (rs 1800629), TNF-a 
(rs361525), IL-1B (rs3087258), CD-14 (1rs2569190), IL-10 (rs1800872) and 
PNPLA3 (rs738409), glutathione S-transferase P1 (GSTP1-Val allele) in alcoholics 
(Yokoyama et al. 2013; Roy et al. 2012; Dutta 2013; Bansal et al. 2013; Wan et al. 
1998; Liu et al. 2012; Zeng et al. 2013; Agrawal and Bierut 2012; Nischalke et al. 
2013; Giby and Ajith 2014; Adams et al. 2005). CYP2E1 is a major determinant of 
alcohol-induced toxicity in the liver, intestine, brain and other peripheral tissues 
where it is expressed (Liu et al. 2012). Meta-analysis suggested that CYP2E1 poly- 
morphism is associated with alcohol-induced steatosis, hepatitis and fibrosis (Wan 
et al. 1998). Alcoholic cirrhotics also showed statistically significant increase in 
PNPLA3 allele (Yokoyama et al. 2013) (Table 3.5). 

In addition, neurotransmitter aminobutyric acid receptor gene appears to have a 
role in the development of alcohol dependence (Dutta 2013). In fact, many intrahe- 
patic chemokines and their receptors are upregulated in alcohol-induced liver 





Table 3.5 Candidates genes associated Function Gene* 


a Alcohol metabolism ADHIB 
ADHIC 
CYP2El 
CYPIA1 
ALDH2 

Oxidative stress GSTM1 
GSTP1 

GSTT1 
MnSOD 

NAT 

HFE 

Immune reactions TNF-« 

IL-10 

IL-IR 

IL-1B 

Fibrosis-associated factors TGFP1 

MMP3 

Modulation of Steatosis PPARg 

MTP 

ApoE 
Other NFkB1 
DRD2 
SLC6A4 
GSTP1 
PNPLA3 
GABRA2 


“Some of these genes have poor association with 
the ALD 
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fibrosis. Polymorphism of CXC-chemokine ligand 1 (CXCL1), inflammatory cyto- 
kine gene (rs4074) is also considered as an independent factor for cirrhosis and 
HCC in ALD patients, even in the absence of liver damage (Zeng et al. 2013; 
Agrawal and Bierut 2012; Nischalke et al. 2013). 


3.1.9 Non-alcoholic Fatty Liver Disease (NAFLD) 


NAELD is affecting approximately 20-40% of the population in western countries 
and becoming increasingly frequent in Asian subcontinent especially in India. 
NAFLD has wide spectrum, from steatosis, non-alcoholic steatohepatitis (NASH) 
and advanced fibrosis to end-stage liver disease cirrhosis and hepatocellular carci- 
noma (HCC) (Giby and Ajith 2014; Adams et al. 2005; Angulo 2002; Clark et al. 
2002; Chitturi et al. 2004). 

It is a metabolic syndrome, its pathophysiology centrally associated with insulin 
resistance (Day 2006; Guerrero et al. 2009). It is evident that majority of patients 
with NAFLD develop hepatic steatosis (Wilfred de Alwis and Day 2007), and only 
a small group develops the more advanced forms of non-alcoholic steatohepatitis 
(NASH). 

However, other than obesity and insulin resistance, many factors are responsible 
to develop NAFLD or its advanced form, which still remain unclear. Environmental 
and genetic factors also play a part in developing the disease. 

Therefore, determining the genetic factors which predispose an individual for 
developing NAFLD might help in taking preventive strategies in those at higher risk 
(Browning et al. 2004b). These genetic variants associated with disease risk should 
lead to the development of non-invasive biomarkers and identification of novel 
treatment targets. 

Regardless of, equal susceptibility rate of obesity in African—Americans and 
European Americans of Hispanic origin has lower incidence of both steatosis 
and cryptogenic cirrhosis (Guerrero et al. 2009). However, there is also increas- 
ing data suggesting that US Hispanics are also reported to be more susceptible 
to NAFLD than those of European descent (Browning et al. 2004a; Williams 
et al. 2011). 

Non-alcoholic steatohepatitis (NASH) is the progressive form of NAFLD which 
often leads from steatosis to steatohepatitis and cirrhosis and may progress to 
HCC. In steatosis, liver cell large vacuoles accumulate triglyceride fat via the pro- 
cess of lipogenesis. Therefore, mediators released from adipose tissue such as adi- 
pokines play integral role in NAFLD. Adipokines regulate homeostasis in 
maintenance of energy through lipogenesis, lipolysis and fatty acid oxidation. Fatty 
acid oxidation in the liver is activated via peroxisome proliferator-activated receptor 
(PPAR)-a by leptin and adiponectin (Williams et al. 2011; Namikawa et al. 2004; 
Bernard et al. 2000). 

It has been shown that patients with NAFLD show mutations in microsomal tri- 
glyceride transfer protein (MTP). MTP is critical for the production of very-low- 
density lipoprotein (VLDL) in the liver as well as in the intestine. G/T SNP at 
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position —493 in the promoter region of MTP has been linked with low expression 
of MTP levels resulting in failure to secrete triacylglycerol (TG) in the liver. NAFLD 
patients with G/G genotype have increased chance of steatosis and NASH com- 
pared to T/T genotype (Namikawa et al. 2004; Bernard et al. 2000; Oliveira et al. 
2010). 

Partial loss of function in the phosphatidylethanolamine N-methyltransferase 
(PEMT) gene was also reported to be involved in the production of phosphatidyl- 
choline needed for VLDL synthesis. In two different studies, Japanese and American 
patients with biopsy-proven NASH showed increased frequency of V175M allele of 
PEMT gene than the controls (Dong et al. 2007; Song et al. 2005). These patients 
had the lower BMI, indicating that they were more genetically predisposed to 
develop lean NASH (Song et al. 2005). SNPs in other genes which regulate intrahe- 
patic free fatty acid (FFA) and TG synthesis, storage and export are also attractive 
candidates for NAFLD. In addition, pregnane X receptor (PXR) gene, a well-defined 
transcription factor, also has role in lipid homeostasis and hepatic detoxification 
mechanisms (Zhang et al. 2008; Zhou et al. 2006). SNPs in PXR gene were signifi- 
cantly associated with NAFLD and considered as predictor of disease severity 
(Sookoian et al. 2010). SNP in the promoter region of binding site of transcription 
factor HNF-4 affects the expression of PXR, CYP3A4 and ABCB] genes. Therefore, 
there is a high possibility of its effect on lipid homeostasis which needs to be 
investigated. 

Apolipoprotein (APO) genes, a constituent of VLDL, are also appealing candi- 
dates for their roles in NAFLD vulnerability and development; however, data is very 
limited. In Indian population, APOC3 is associated with triglyceridemia and con- 
sidered as one of the strongest factors for NAFLD (Salamone et al. 2010). Two 
variant alleles in the promoter region of APOC3 are in linkage disequilibrium and 
associated with NAFLD. However, contradictory to this, in Caucasians, there was 
no association between APOC3 variants and hepatic TG content or insulin resis- 
tance (Kozlitina et al. 2011) (Table 3.6). 





















































Table 3.6 Candidate genes associated with Gene SNP 
me NeeLD APOC3 "| T455C, C482T 
MTP 498G/T 
-PEMT V175M 
PNPLA3 18738408, 18738409 
FDFT1 182645424 
COL13A1 131227756 
-EFCAB4B 13887304 
PZP 136487679 
NCAN 132228603 
PPPIR3B 134240624 
GCKR 18780094 
LYPLALI 1312137855 
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3.2. ~+Viral Hepatitis 


Acute or chronic viral hepatitis due to hepatotropic viruses like hepatitis A, B, C, D 
and E is the most common cause of liver disease worldwide. In addition to hepatic 
viruses, other viruses like herpes simplex, cytomegalovirus, Epstein-Barr virus or 
yellow fever also cause liver inflammation (hepatitis). 

HAV spreads by the faecal-oral route and is often related with ingestion of con- 
taminated food or water. It is responsible for an acute form of hepatitis and does not 
cause chronicity. The patient’s immune system develops antibodies against HAV 
which builds up lifelong immunity. 

Hepatitis B virus belongs to Hepadnaviridae virus family and causes both acute 
and chronic hepatitis. More than 350 million people are chronic carrier of hepatitis 
B infection. Ninety to 95% individuals clear the virus by potent immunity; however, 
5—10% acute infection turns into chronic hepatitis, and adults are unable to clear the 
virus. HBV infection is transmitted through blood transfusion, tattoos, unsafe sex or 
HBV-infected mother to child through transplacental crossing. But, in about half of 
cases, the source of infection cannot be determined. 

Chronic hepatitis B patients develop specific antibodies against hepatitis B, but 
the titers of antibodies are not enough to clear the infection. Therefore, continued 
replication of virus and small amount of antibodies is the main cause of the immune 
complex disease in these patients. 

Clinical outcome in HBV infection is mainly decided by viral, host immunologi- 
cal and genetic factors. HBV infection influences many cellular processes through 
genetic instability such as: 


e Virus binding, entry, fusion, with cell membrane 

¢ Modulation of host immune response 

¢ Cause of pathological alterations in the liver 

¢ Development of liver cirrhosis and HCC 

e Mother-to-infant vertical transmission resistance to antiviral therapies 


Many studies have reported association of HBV infection with HLA. HLA class 
II alleles such as DRB1*1302 or HLA-DR13 or DQA1*0501-DQB1*0301- 
DQB1*1102 are associated with acute and/or chronic HBV infection. Several pro- 
inflammatory (Th1) cytokines like IL-2, IFN-y and TNF-a have a major role in 
boosting host immunity leading to viral clearance. On the contrary, IL-10 cytokine 
(Th2) serves as a potent inhibitor of Th1l effector cells in HBV infection. 

Polymorphism in IL-16 gene was found to be associated with HBV-related 
HCC. TG and GG genotypes of rs11556218 T/G SNP in I-16 were significantly 
associated with increased risk of HBV-related HCC compared to TT genotype 
(OR = 1.96 and OR = 3.33). Previous data revealed that subjects with the G 
allele appeared to have lower susceptibility to chronic hepatitis B infection than 
those with the T allele (OR = 0.46). Under the dominant model, genotype 
TG + GG seemed to have lesser association to chronic hepatitis B (OR = 0.44) 
(Li et al. 2011). 
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Another SNP, rs4072111C/T, with TT genotype prevalence was found to be 
associated with risk of HBV-related HCC compared to CC genotype (OR = 6.67). 

Programmed death (PD 1) receptor and ligand are important in viral clearance as 
well as treatment outcomes. Recently, PD1 antagonist is being considered as main 
therapy along with antivirals and interferons. It is observed that interaction between 
PD1 and T-cell immunoglobulin (Tim-3) is important in immune dysfunctions in 
chronic HBV infection. It was observed that PD1 and TIM3 polymorphisms dif- 
ferentially and interactively predispose cirrhosis and HCC in HBV-infected patients 
(Li et al. 2013) (Table 3.7). 

In the large meta-analysis, it is observed that polymorphisms in PD/ and TIM3 
genes at position +8669 and —1516 have profound effect on HBV chronicity. 

Multivariate analysis showed that, in addition to PD/ +8669 genotype AA and 
TIM3-1516 genotypes GT + TT, gender, age, ALT, albumin and HBV DNA were 
associated with HBV cirrhosis compared to patients without cirrhosis. The com- 
bined presence of PD/ +8669 AA/TIM3-1516 GT or TT was higher in cirrhosis and 
HCC pooled patients than in patients without cirrhosis (OR, 2.326; p = 0.020) and 
HCC. 

Other important polymorphisms in the coding region of MBP gene is reported to 
be involved in chronic HBV infection. However, since genetic interactions are com- 
plex, reports from different studies showed inconsistencies with respect to the 
effects of host genetic factors on HBV clearance or persistence. 

Therefore, it is unlikely that a single allelic variant is responsible for HBV 
persistence or clearance. It may be that collective influence of several single- 
nucleotide polymorphisms (SNPs) or haplotype(s) underlies the synergistic pro- 
tection against HBV (Wang 2003). Combined panel of genes in large cohort of 
HBV-infected patients may provide insight into the pathogenesis of HBV infec- 
tion and a unique rationale for new methods of diagnosis and therapeutic 
strategies. 


Table 3.7 Genes associated with viral hepatitis clearance and persistence 























Viral 
hepatitis | HLA Cytokine IL Other 
B DRB1*1302,DR2, TNF-alpha, | IL-2 PD-1 
DR6,DR13,DQA1*0501-DQB1*0301- IFN-gamma |IL16 | Tim-3 
DQB1*1102 IL28B_ | SPP1 
IL18 =| CCR5 
GNLY 
MBL 
GC DQB1*0301, DQB1*0201, DQB1*03, TNF-alpha, |IL28B | HFE 
DQA1*03, DRB1*01, DRB1*0101,DRB1*0301, | TGF-b1 TYK2 
DRB1*04,DRB1*0401, DRB1*11, SVR 
DRB1*12, DRB1*1101,DRB1*1104, 
DRB1*1302,DRB3*03, DRQ1*1101,DQA1*03, 
DQA1*0501,DQB1*0302, A*1101, 
B*57,Cw*0102, DR1S 
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Conclusions 

Present genetic data of the distributions and functions of the implicated allele poly- 
morphisms mostly lie in small patient groups and in few ethnic groups. Future elab- 
orate studies should be constituted internationally including multi-cohorts to clarify 
gene associations and identify other potential candidate genes in liver diseases. 
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Implication of Pre-replication Complex 4 
Proteins in Human Disease 


Abid Khan, Arindam Chakraborty, and Supriya G. Prasanth 


Proper DNA replication is essential for the maintenance of genomic integrity during 
cell division. The DNA replication process in eukaryotes is an immaculately con- 
certed and complex process that has evolved several safeguards to ensure the pre- 
vention and propagation of erroneous DNA duplication. Hence any mistakes therein, 
either due to mutations in the components of the replication machinery or change in 
their expression pattern, could lead to significant genomic abnormalities, which 
could manifest in a variety of diseases. In this chapter, we will discuss some of the 
major disease phenotypes associated with DNA replication machinery, particularly 
focusing on the members of the pre-replication complex (Pre-RC). 

DNA replication begins with the loading of the pre-RC onto replication origins 
during G1 phase of the cell cycle. The process of “replication licensing” ensures 
that the chromosomes are replicating only once per cell cycle. Licensing entails the 
sequential loading of ORC (origin recognition complex), the loading factors Cdc6 
(cell division cycle 6) and Cdt1 (chromatin licensing and DNA replication factor 1) 
and the replication associated helicase MCM2-7 (minichromosome maintenance 
2-7) (Bell 2002; Bell and Dutta 2002; Dutta and Bell 1997; Kelly and Brown 2000). 
Briefly, ORC and Cdc6 form a stable complex on DNA, which then recruits Cdt1. 
Cdt1 loading onto origins is essential for the recruitment of MCMs. Cdtl levels are 
regulated by Geminin during the cell cycle. At the end of mitosis, cellular Geminin 
levels are low, and therefore it binds to Cdt1 at a lower stoichiometric ratio, allow- 
ing Cdtl to be “active” and load onto the origins and associate with ORC and Cdc6. 
At the G1/S boundary, Geminin levels rise and stay stable through S-G2-M. During 
these phases, Geminin binds to Cdt1 at a higher stoichiometric ratio, suppresses its 
activity, and can titrate it off of the pre-RC. Alternatively, Cdt1 is also known to 
undergo ubiquitination-mediated proteolysis during early S phase (Nishitani et al. 
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2004, 2006). These mechanisms preclude MCMs from loading onto origins post 
licensing and therefore prevent relicensing of the origins and hence rereplication 
(Blow and Dutta 2005; Lau et al. 2006; Li and Blow 2005). Following the formation 
of a functional pre-RC, initiation of DNA replication requires the activity of Cdks 
(cyclin-dependent kinases) and Ddk (Dbf4-dependent kinase, Cdc7) during G1/S to 
phosphorylate pre-RC which then facilitates the recruitment of MCM10, Cdc45, 
and GINS leading to origin firing during S phase (Sheu and Stillman 2006; Stillman 
1996; Zou and Stillman 1998). 


4.1 Mutations of ORC in Meier-Gorlin Syndrome (MGS) 








Origin recognition complex, a six subunit complex, was first identified in 
Saccharomyces cerevisiae as a sequence-specific DNA replication initiator complex 
(Bell and Stillman 1992). It binds to origins in an ATP-dependent manner and 
recruits other members of pre-RC. Although ORC proteins exhibit high sequence 
homology with higher eukaryotes, the mechanism of origin recognition and chro- 
matin association is highly variable (Bell and Dutta 2002). Unlike S. cerevisiae, in 
higher organisms, ORCs do not exhibit DNA sequence-specific origin-binding 
activity; however, they have been shown to bind to AT-rich DNA elements (Vashee 
et al. 2003). In Xenopus laevis, Homo sapiens, and Drosophila melanogaster, one 
or more ORC subunits stay bound to chromatin throughout cell cycle. While the 
levels of Orc2-5 do not change during the cell cycle in human cells, Orcl gets 
degraded in post G1 cells (Mendez et al. 2002). In order to understand the mecha- 
nism of ORC stability and activity, many new ORC-interacting partners have been 
identified (Shen and Prasanth 2012), including LRWD1/ORCA (ORC associated) 
(Bartke et al. 2010; Shen et al. 2010) which stabilizes ORC on chromatin. In addi- 
tion to its canonical role in replication initiation, ORC components have been shown 
to exhibit a wide variety of replication-independent functions such as heterochro- 
matin organization (Prasanth et al. 2010), centriole, and centrosome duplication 
(Huang et al. 1998; Pak et al. 1997; Prasanth et al. 2010), cytokinesis (Prasanth et al. 
2002). Although ORC is involved in an extremely important and highly conserved 
cellular process, yet there aren’t many reports implicating ORC subunits in human 
disease. Here we will discuss the two major diseases associated with ORC 
deregulation. 

Recently, Meier—Gorlin syndrome (MGS), a primordial dwarfism syndrome, 
has been linked to the mutations in genes encoding the pre-RC components like 
Orc1, Orc4, Orc6, Cdc6, and Cdt1. This rare autosomal genetic recessive disorder 
was originally described by Meier et al. (1959) and subsequently by Gorlin et al. 
(1975). However, the term Meier—Gorlin syndrome was first time coined by Boles 
et al. (1994). It has also been known as ear, patellae, short stature syndrome 
(Cohen et al. 1991), characterized by bilateral microtia (small ears), hypoplastic 
or absence of patellae, short stature, craniofacial anomalies, and growth retarda- 
tion (Bongers et al. 2001; Gorlin et al. 1975; Loeys et al. 1999). Genital anomalies 
and mammary hypoplasia have also been reported in some cases (Bongers et al. 
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2001; de Munnik et al. 2012a). More than 60 cases of MGS have been reported 
worldwide and 35 individuals out of 45 patients with MGS have been shown to 
carry mutations in five genes encoding pre-RC component proteins: ORC1, 
ORC4, ORC6, CDT1, and CDC6 (Bicknell et al. 201la, b; de Munnik et al. 
2012a, b; Guernsey et al. 2011). 

Molecular genetic analysis in two sibs with microcephalic primordial dwarfism 
resembling Meier—Gorlin syndrome, from a Saudi—Arabian family, identified mis- 
sense mutation in Orcl gene (Bicknell et al. 2011b). The mutation was caused by 
a homozygous A to G transition (c.314A>G) resulting into a non-conservative 
amino acid substitution (E127G) in exon 4 of Orcl gene (Bicknell et al. 2011b). 
Subsequent analysis in 204 additional individuals with microcephalic primordial 
dwarfism further identified biallelic missense mutations, including a recurrent 
mutation, R105Q, that changes a conserved amino acid at the N-terminal BAH 
domain of ORCI (Bicknell et al. 2011b). Studies in cell lines established from 
MGS patients carrying the mutations E127G and R105Q revealed reduced chroma- 
tin binding of ORC1 and impaired pre-RC assembly. Further, cells carrying ORC1 
mutation showed impaired licensing and replication origin activation and perturbed 
S phase entry and progression (Bicknell et al. 2011b). Interestingly, targeting of 
Orcl in zebrafish morphant model significantly reduced the embryo size and 
caused dwarfism (Bicknell et al. 2011b). Similar growth defects and dwarf-like 
phenotype was also observed upon mcmS depletion in zebrafish, suggesting the 
role of impaired origin licensing in manifesting the MGS-like growth retardation 
(Bicknell et al. 2011b). Recent study by Kuo et al. (2012) has further linked the 
ORC1 BAH domain mutation to Meier—Gorlin syndrome (MGS). The histone H4 
dimethylated at lysine 20 (H4K20me2) is enriched at the site of replication in 
diverse metazoans. Orc! binds to this methylated histone through its BAH domain 
and regulates ORC-chromatin binding (Kuo et al. 2012). Disruption of ORC 1 gy to 
H4K20me2 recognition impairs ORC-chromatin association at replication origins 
(Kuo et al. 2012). Interestingly, the wild-type human Orc! mRNA but not the 
H4K20me2-binding pocket mutants (hORC1-Y64A and hORC1-W88A) mRNA 
could rescue the dwarf phenotype of Orcl depleted zebrafish, when co-injected 
with orcl-targeting morpholino oligonucleotides (Kuo et al. 2012). Additionally, 
zebrafish depleted of H4K20me2 or mice lacking H4K20me2 exhibited similar 
growth defect phenotype of orc! morphants (Kuo et al. 2012; Schotta et al. 2008), 
indicating a possible connection of BAH domain mutation and MGS pathogenesis. 
Defects in primary cilia formation, cilia function, and chondro-induction have also 
been attributed to Orcl MGS mutations (Stiff et al. 2013). Alteration of centro- 
some duplication associated with Orc! mutations that disrupt Cyclin E-CDK2 
kinase inhibition has also been implicated in some patients with Meier—Gorlin 
syndrome (Hossain and Stillman 2012). Subsequent sequencing study by Bicknell 
et al. (2011b) in patients with MGS, microcephaly, and profound growth retarda- 
tion further identified compound heterozygosity for a splice acceptor site mutation 
(intron 9 splice acceptor site) and a frameshift mutation (p. Val667fsX24), respec- 
tively, in conjugation with the previously identified recurrent mutation R105Q 
(Bicknell et al. 201 1a). 
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Sequencing of Orc4 gene from the individuals with MGS lead to the identifica- 
tion of a homozygous missense mutation, Y174C affecting the consensus AAA+ 
(ATPase associated with a wide range of cellular activities) domain of ORC4 
(Bicknell et al. 2011a). Similar mutation was also independently identified by 
Guernsey et al. (2011) from additional patients. Compound heterozygosity for Y174 
and another frameshift mutation causing premature protein truncation have also 
been reported in some cases (Bicknell et al. 2011a; Guernsey et al. 2011). The 
reported Tyr!” is highly conserved across the taxa, maps between the Walker B 
motif, and the sensor domain I and has been shown to be crucial for interacting with 
a conserved arginine residue on an adjacent helix structure (Bell 2002; Chuang and 
Kelly 1999; Guernsey et al. 2011; Iyer et al. 2004). Functional analysis in yeast 
S. cerevisiae, carrying the missense mutation Y174C (orc4%?°) revealed to be 
pathogenic and resulted in slower growth rate due to the defect in G1 to S phase 
transition (Guernsey et al. 2011; Ladha 2011). 

Compound heterozygosity for a loss-of-function mutation caused by a 2 bp dele- 
tion and a missense mutation in the Orc6 gene have been reported from the three 
siblings of a Turkish family (Bicknell et al. 2011a). Though the Orc6 molecule is 
poorly conserved among metazoans, the amino acid tyrosine, which is substituted 
by serine (Y232S) as a result of the missense mutation, is highly conserved from 
yeast to human (Bicknell et al. 201 1a; Bleichert et al. 2013). In Drosophila, bio- 
chemical studies have shown that the conserved amino acids including the tyrosine 
mutated in MGS patients at the C-terminal end of the Orc6 is essential for its assem- 
bly with ORC via its interaction with Orc3 (Bleichert et al. 2013). Functional study 
with corresponding human Orc6 MGS mutation in Drosophila (Y225S) reduced its 
recruitment in ORC and weakened its association with Orc3 (Bleichert et al. 2013). 
In human also, pulldown assay using Orc6 mutant carrying MGS mutation also 
affected its binding to ORC1-5 (Bleichert et al. 2013). Similar observation was also 
made in S. cerevisiae, suggesting that MGS mutation in Orc6 interferes with ORC 
function by reducing its association with Orc3 (Bleichert et al. 2013). 

Mutations have also been reported in well-conserved residues of Cdt1 at the 
C-terminal domain of the protein, a potential site for MCM interaction during origin 
recognition (Bicknell et al. 201 1a; Guernsey et al. 2011; Jee et al. 2010). Another 
homozygous missense mutation that substitutes a conserved threonine residue 
(T323R) in the Cdc6 gene has also been identified in a patient with MGS (Bicknell 
et al. 201 1a). This mutation also lies within AAA domain, an ATP-binding domain 
known for its crucial function during DNA replication (Bicknell et al. 201 1a; Herbig 
et al. 1999). 

Molecular genetic analysis of patients with MGS clustered all the mutations in 
five pre-RC components. Biochemical characterizations further indicate the 
impaired functions of DNA replication initiation associated with these mutations. 
Therefore, it has been suggested that the defects in replication licensing leads to the 
clinical manifestation of growth failure in the individuals with Meier—Gorlin syn- 
drome. In addition, recent work has demonstrated that Orcl mutations in MGS 
patients are due to disruption of an Orcl CDK inhibitory domain and centrosomal 
reduplication (Hossain and Stillman 2012). This is further supported by the 
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observations that a reduced efficiency in the formation of cilia due to centrosome 
defects contribute to the clinical features associated with MGS (Stiff et al. 2013). 
These results suggest that the multi-talented ORC proteins have diverse functions, 
and intact ORC is critical to maintaining genomic integrity. 





4.2. Dysregulation of Pre-replication Complex Proteins 
in Cancer 


Any perturbation in the process of origin licensing can have deleterious effects on 
the propagation of faithful replication and consequently could lead to drastic 
genomic instabilities, which are a hallmark of many cancers. Therefore, it is not 
surprising that many of the pre-RC components are upregulated in several cancers. 

Functional depletion studies of pre-RC components such as ORC, Cdc7, and 
Cdt1 in mammalian cells lead to cell cycle arrest and/or cell death (Feng et al. 2003; 
Kim et al. 2002). Specifically, depletion of ORC subunits causes aberrant DNA 
replication, S phase progression, mitotic defects, and cell death (Prasanth et al. 
2002, 2004). Other reports have demonstrated that pre-RC components are essential 
for S phase checkpoint signaling and genomic stability (Clay-Farrace et al. 2003) as 
well. These reports clearly demonstrate that normal cellular levels of pre-RC com- 
ponents are vital for faithful DNA replication and normal cell cycle progression. 
How then could insufficiency in pre-RC components manifest at a physiological 
level? One hypothesis is that insufficiency in pre-RC components could contribute 
to tumorigenesis by inducing genomic instability. 

Licensing proteins have been found to be significantly upregulated in neoplastic 
cells but not in quiescent cells, making them highly useful diagnostic tools for 
detecting tumors. Cdc6 levels have been reported to correlate with the presence of 
neoplastic cells (Semple and Duncker 2004). Increased expression of Cdc6 and 
Cdt1l has been observed in cervical, lung, and brain cancers (Karakaidos et al. 
2004). Furthermore, introduction of Cdtl expressing NIH3T3 cells into nude mice 
can drive tumorigenesis (Arentson et al. 2002). Mice that overexpress Cdtl have 
been shown to develop lymphoblastic lymphomas in a p53 null background 
(Seo et al. 2005). 

Mcm proteins have been particularly effective as diagnostic and prognostic markers 
for many different tumors (Gonzalez et al. 2005). It is well established that MCM 
expression is lost during differentiation, and it is not present in the differentiated epi- 
thelial cells of the cervix, urinary tract, etc. (Freeman et al. 1999; Gonzalez et al. 2005). 
Interestingly, it is found to be upregulated in malignant tumors of these sites (Hiraiwa 
et al. 1997; Todorov et al. 1998; Tachibana et al. 2005). MCM7 levels were found to be 
elevated in keratinocytic tumors relative to the normal cells (Hiraiwa et al. 1997). 
MCMS5 levels were shown to be upregulated in cervical and esophageal cancers 
(Williams et al. 1998). MCM68 disruption and an alternative splice form have been 
observed in hepatic carcinoma (Gozuacik et al. 2003) and choriocarcinoma (Johnson 
et al. 2003), respectively. Elegant work in mice carrying a hypomorphic MCM4 allele, 
Chaos3, exhibits higher incidence rates of mammary adenocarcinoma and increased 
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sensitivity to DNA breaks under replication stress (Shima et al. 2007). This report and 
several others suggest that partial loss-of-individual pre-RC components could lead to 
genomic instability and contribute to tumorigenesis. MCM family of proteins has the 
potential to be a diagnostic as well as a prognostic tumor marker for a variety of tumors 
including colorectal tumors, lung cancer, and oral and anal cancer (Giaginis et al. 2010, 
2011; Hua et al. 2014). MCMs are routinely used in the clinics, including for early 
detection of cancer and because of its abundance and ease of detection of this nuclear 
protein, a high-throughput screening approach is possible. 

Although these data suggest involvement of pre-RC components in tumorigene- 
sis, they do not however illustrate pre-RC’s importance in the induction of oncogen- 
esis. As with loss of function, gain of pre-RC function can also have serious 
replication defects that promote oncogenesis. Overexpression of Cdc6, Cdtl, or 
MCM7 in mammalian cells results in abnormal DNA replication and S-G2-M check- 
point activation (Honeycutt et al. 2006; Karakaidos et al. 2004). An overabundance 
in Cdt1 results in rereplication and chromosome fragmentation in Xenopus (Davidson 
et al. 2006). Abnormal replication leading to genomic instability is hypothesized to 
promote tumorigenesis. Consistent with this notion, groups have shown that Cdt1 
transgenic mice under a p53 null background develop lymphoblastic lymphoma and 
that Cdtl overexpression in NIH3T3 cells can promote transformation (Seo et al. 
2005). Although these reports clearly demonstrated the oncogenic potential of pre- 
RC, there are very few reports delineating the mechanism by which pre-RC compo- 
nents promote oncogenesis. One such study demonstrated a detailed mechanism of 
action for Cdc6 in oncogenesis via transcriptional repression of the tumor suppressor 
INK4/ARF locus. Gonzalez et al. showed that overexpression of Cdc6 causes hyper- 
methylation of the locus thereby suppressing its transcriptional output. Cdc6 binds to 
a replication origin in the locus, which is also a transcriptional control element and 
recruits histone deacetylase complex resulting in hypermethylation and repression of 
pl4ARF, pl6INK4A, and p14INK4B expression (Gonzalez et al. 2006). The locus 
encodes three critical cell cycle inhibitors; therefore, repression of these products 
alleviates cell cycle control, and together with abundance in Cdt1, there is enhanced 
cellular proliferation, which contributes to tumorigenesis. Consistent with this they 
also show that overexpression of Cdtl can cooperate with Ras to transform mouse 
embryonic fibroblasts (MEFs). Taken together these data suggest that proper pre-RC 
function impacts checkpoint signaling, genome stability, and tumorigenesis. 

The role of pre-RC proteins in tumorigenesis came from understanding the biol- 
ogy of tumor viruses Epstein-Barr virus (EBV) or human herpes virus (HHV). EBV 
achieves replication of its genome by hijacking the host cellular replication machin- 
ery. EBV was first identified in a cell line derived from Burkitt’s lymphoma patient 
by Epstein et al. in Barr group, and hence the name (Epstein et al. 1964). One of the 
most common human viruses is thought to infect more than 90% of human popula- 
tion. EBV is transmitted through saliva or genital secretion. EBV infection does not 
usually illicit strong symptoms in most cases. It can remain latent indefinitely inside 
the B-cells. Under immunodeficient conditions of the host, EBV gets activated and 
switch to its lytic replication mode, producing millions of virions. Reactivation of 
EBV can then result in a range of cancers including epithelial malignancies, lym- 
phomas, and lymphoproliferative disorders (Maeda et al. 2009; Tao et al. 2006). 
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EBNAI (Epstein-Barr nuclear antigen 1), a protein encoded by the EBV genome, 
binds to its origin of viral replication, oriP, and then recruits host cell pre-RC com- 
ponents to initiate replication (Dhar et al. 2001). An Orc2 hypomorphic allele abro- 
gates EBNAI-mediated viral DNA replication but this can be rescued by 
overexpression of wt Orc2, suggesting that host Orc? is critical for the viral genome 
replication. EBNAI associates with Orc2 and ChIP experiments show that Orc2 
binds to the oriP sequence (Chaudhuri et al. 2001; Dhar et al. 2001). Orcl and 
MCM bind to the oriP sequence in a cell cycle-dependent manner, while other ORC 
subunits remain associated with oriP throughout cell cycle (Chaudhuri et al. 2001; 
Ritzi et al. 2003). Overexpression of Geminin inhibits EBNA1-ORC-mediated rep- 
lication of EBV genome from oriP, and this can be rescued by overexpressing Cdt1 
(Dhar et al. 2001). Taken together these data suggests that EBV hijacks not only 
ORC but the entire pre-RC complex to facilitate replication of its genome. 





Conclusion 
DNA replication is essential for normal development, and the central player in 
this field is the pre-RC that ensures faithful propagation of the genome through 
successive cellular divisions. ORCs play a pivotal role in the pre-RC. Detailed 
mapping and characterization of genetic mutations with disease phenotypes such 
as in the case of MGS have provided valuable information with regard to its role 
in normal development. On the other hand, EBV provides a simple and valuable 
tool to study origin licensing by pre-RC and for therapeutic intervention. 
Although there is a growing body of literature implicating pre-RC in cancer, 
there still remain a number of unanswered questions with regard to the precise 
role of pre-RC in oncogenesis. For example, how is the expression of pre-RC 
components deregulated during cancer? Are mutations in pre-RC components 
alone sufficient to drive oncogenesis or do they require secondary mutations in 
key tumor suppressors? How do cancer cells evade replication checkpoint activa- 
tion? Some of these questions warrant extensive investigation that will help elu- 
cidate the oncogenic potential of pre-RC proteins and improve our understanding 
of the importance of proper DNA replication in carcinogenesis. 
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Non-muscle Myosin II Motor Proteins 
in Human Health and Diseases 


Venkaiah Betapudi 


5.1 Introduction 


Man-made machines are involved in mediating a diverse range of human activities 
in modern world, so are natural myosin motor proteins in driving multiple aspects 
of cellular life. Myosins belong to a special group of proteins called mechanochemi- 
cal enzymes or colloquially molecular machines/motor proteins because of their 
ability to move on intracellular tracks and convert cellular free energy released from 
ATP into mechanical work (Betapudi 2014; Bustamante et al. 2004). Human body 
is like a complex machine that is made up of approximately 37.2 trillion cells 
(Bianconi et al. 2013), and each individual cell is equipped with a variety of molec- 
ular machines to perform specific mechanical function. Based on their involvement 
in mediating different mechanical functions in the cellular world, they are classified 
into polymerization (actin, microtubule, dynamin), cytoskeletal (myosin, kinesin, 
dynein), rotary (FOF1-ATP synthase), and nucleic acid (DNA and RNA polymer- 
ases, topoisomerases, helicase, remodels the structure of chromatin, SW1/SNF 
complex, structural maintenance of chromosomes, viral DNA packaging protein) 
motor proteins (Howard 2014; Kolomeisky 2013). In addition to these specialized 
motor proteins, cells express another unique type of motor protein called “prestin” 
that is essential for auditory processing; however, its expression is limited to mam- 
malian cochlear outer hair cells in order to produce mechanical amplification in the 
auditory portion of the ear cochlea. Unlike the classical ATP-dependent motor pro- 
teins, this special membrane motor protein with piezoelectric properties directly 
converts voltage into mechanical work within microseconds and thus received its 





V. Betapudi 
Department of Cellular and Molecular Medicine, Lerner Research Institute, Cleveland Clinic, 
Cleveland, OH 44195, USA 


Department of Physiology and Biophysics, Case Western Reserve University, 
Cleveland, OH 44106, USA 
e-mail: betapuv @ccf.org; vxb19 @case.edu 


© Springer Nature Singapore Pte Ltd. 2017 79 
L. Rawal, S. Ali (eds.), Genome Analysis and Human Health, 
DOI 10.1007/978-98 1-10-4298-0_5 


80 V. Betapudi 


name prestin to represent a musical notation “presto” which means extremely fast 
in Italian (Zheng et al. 2000). Another special type of motor protein myosin III with 
ATPase and kinase activity has been reported in the retina of several organisms. 
This unique motor protein with kinase activity is believed to play critical roles in 
mediating the visual phototransduction in rod, cone, and photosensitive ganglion 
cells of the retina of many organisms. Thus, cell-type-specific expression of these 
molecular machines with dedicated functions is probably a part of nature’s strategy 
for the eukaryotic cell origin and diversification. 

Given their ability to operate in a cellular world where Brownian motion and 
viscous forces dominate inertia, these biological machines are also called Brownian 
motors. Interestingly, these cellular motor proteins transduce the ATP-released cel- 
lular free energy into mechanical work more efficiently than man-made combustion 
engines (Kabir et al. 2011; van den Heuvel et al. 2007). Many modern cell biologists 
believe that the mechanical work performed by these cellular motor proteins inter- 
sects with every facet of cell biology. Indeed, the mechanical work performed by 
these motor proteins drive several cellular activities that are essential for mediating 
reproduction, childbirth, growth, development, immunity, and singing a courtship 
song in fruit flies as well as predisposing human beings to a certain degree of risk 
for various pathological conditions and diseases (Chakravorty et al. 2014; 
Maravillas-Montero and Santos-Argumedo 2012; Min et al. 2014; Pecci et al. 2014; 
Slonska et al. 2012; Stedman et al. 2004). It is largely believed that no biological 
cell can function and operate without the involvement of these multifunctional 
molecular machines. The present chapter is about the discovery, current understand- 
ing, and recent advances in various aspects of myosin motor proteins as well as their 
regulation and relevance to human health and diseases. 





5.2 Discovery of Myosin Motor Proteins 


More than 150 years ago in Heidelberg, a soluble protein that is responsible for 
muscle rigidity or contraction was identified by Willy Kuhne in the extracts of 
smooth and striated muscles and then named “myosin” (myo- + -ose + -in) or col- 
loquially called “Kuhne’s myosin” (Kiihne 1864). Myosin means “within muscle,” 
and the term “myo” was originated from a Greek word “mys” to describe muscle. 
Several decades later, Kuhne’s myosin in solution was demonstrated to transmit 
light occasionally with different velocities (Muralt and Edsall 1930). These birefrin- 
gence properties of asymmetric liquid crystals gave first clue about an unstable 
uniform shape and size of Kuhne’s myosin particle. Within four years of the discov- 
ery of myosin birefringence properties, Lohmann demonstrated that the chemical 
energy released from ATP is required for muscle contraction (Lohmann 1934). This 
led to identify molecules with ATPase activities that are present in the extracts of 
smooth muscles. In less than a decade, the ATPase activity for Kuhne’s myosin was 
reported by two independent laboratories (Engelhardt and Liubimova 1994). Later, 
protein purification and crystallization studies uncovered presence of yet another 
motor protein called actin in Kuhne’s myosin particle (Szent-Gyorgyi 1943; 
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Straub 1942). Nearly after three decades of the revelation of ATPase activity for 
Kuhne’s myosin, identification of another protein with ATPase activity in 
Acanthamoeba castellanii (Pollard and Korn 1973) has led to the discovery of a 
large number of divergent motor proteins in many organisms. Recent technological 
advancements in genome sequencing, molecular biological techniques, and bioin- 
formatic approaches have made possible to identify more than 145 myosins in 
eukaryotes except in red algae and diplomonad protists to date (Vale 2003). This 
large number of myosin motor proteins with significant sequence similarities is 
placed under a single family called “myosin superfamily.’ Recent genome analysis 
studies uncovered 20, 687 protein-coding genes in the human genome (Pennisi 
2012), and 40 of them are myosin motor proteins-coding genes. However, lower 
eukaryotes such as Drosophila melanogaster (fruit fly), Oryza sativa (rice), 
Arabidopsis thaliana (mouse-ear cress, a small flowering plant), Saccharomyces 
cerevisiae (yeast), and Dictyostelium discoideum (social amoeba) have 13, 14, 17, 
5, and 13 myosin coding genes, respectively. 

Recent survey of eukaryotic genomes and phylogenetic analyses of the myosin 
gene family reveal continuous evolution of new myosins with a significant expan- 
sion of their abundance throughout eukaryotic evolution predating to the origin of 
animal multicellularity (Sebe-Pedros et al. 2014). This may suggest complex and a 
wide variety of cellular roles for these molecular machines in higher organisms. 





5.3 Cellular Processes Mediated by Myosin Motor Proteins 


Similar to other groups of molecular machines that are expressed in different cell 
and tissue types, myosins are abundantly and ubiquitously expressed motor proteins 
in the human body; nonetheless, some of them display cell- and tissue-specific 
expressions perhaps due to their specialization in mediating certain cellular func- 
tions. Myosins are an essential component of the cell cytoskeleton, a complex net- 
work of small and microfilaments distributed throughout the cytoplasm. The 
cytoskeleton is made up of proteins encoded by approximately 441 genes in human. 
Myosins associate with actin filaments to form “actomyosin system,” an important 
part of the cytoskeleton in the cells. The actomyosin system organization, adapt- 
ability, and dynamic state are critical for maintaining and changing cell morphology 
and polarity during a variety of cellular processes such as cell division, migration, 
endocytosis, intracellular trafficking, microparticles release, and apoptotic cell 
death (Betapudi et al. 2013; Lin et al. 2012). Thus, these molecular machines drive 
a large number of cellular processes that are essential for growth, development, 
maintaining normal physiology, and mediating the death of an organism. Myosins 
also play an important role in creating water-filled body cavities called “hydroskel- 
eton” or “hydrostatic skeleton” that is often found in many ectothermic organisms 
and soft-bodied invertebrates such as starfish, sea urchins, and earthworms (Serwe 
et al. 1993). The actomyosin-mediated contraction of the surrounding circular, lon- 
gitudinal, and helical muscles as well as fluid pressure helps these organisms change 
their shape and produce movement during burrowing and/or swimming (Kier 2012). 
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Interestingly, the non-helical hydroskeleton forms a functional basis for the mechan- 
ical behavior of mammalian penis (Kelly 2002). Unlike mammals, plants express 
only a few types of myosin motor proteins. Though the specific functions of these 
motor proteins are yet to be uncovered, some of them have been implicated in the 
formation and operation of plasmodesmata that are involved in intercellular trans- 
portation and communication in plants (Baluska et al. 2001; Wang and Pesacreta 
2004). Myosins play mechanical roles in photosynthesis indirectly by mediating 
intracellular chloroplast distribution in response to external light conditions (Paves 
and Truve 2007). In addition, myosins involve in mediating intracellular trafficking 
of mitochondria and endoplasmic reticulum in mesophyll cells (Liebe and 
Menzel 1995). 


5.4 Common Structural Features and Classification 
of Myosin Motor Proteins 


Although each myosin motor protein plays a specific role in driving cellular pro- 
cesses in the biological world, majority of them display structural similarities. Most 
myosin motor proteins with a few exceptions have a distinct N-terminal head or 
motor domain followed by the neck, trunk, and a C-terminal tail domain. Myosin 
motor domain carries ATPase activity and high binding affinity for actin filaments. 
These special features allow myosin motor protein to operate on actin filaments that 
are known to spread like intracellular tracks across cytoplasm. Myosin motor 
domains amino acid sequence is conserved among all species; however, the sequence 
of their tail domain remains variable. The tail domain with variable amino acid 
sequence perhaps carries different binding affinity for other cellular proteins and/or 
cargo. The proteins that bind tail domain could be a part of cargo and/or regulators 
of myosin motor activity in the cells. Proteins that bind myosin tail domain may 
determine the fate of intracellular destinations of these motor proteins. All these 
molecular machines with identical motor domains operate on common and or 
closely related intracellular tracks with different destinations ascribed to their tail 
domains in a given cellular world. 

The myosin superfamily that consists of more than 145 members is categorized 
into different classes based on phylogenic analysis of their conserved heads, domain 
architectures, specific amino acid polymorphisms, and organismal distributions 
(Foth et al. 2006; Odronitz et al. 2007; Richards and Cavalier-Smith 2005). Roman 
numerals are assigned to each class of myosins. Names are given in an alphabetical 
order according to their discovery when more than one myosin of the same class is 
expressed in a given organism. According to the classification of the myosin super- 
family members, Kuhne’s myosin that was discovered for the first time in muscle 
extracts was identified as a class II myosin and therefore called conventional myosin 
motor protein and/or the founding member of the myosin superfamily. 

The present chapter is focused on the current understanding of class II myosin 
motor proteins and particularly about their regulation and relevance to human health 
and diseases. 
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5.5 _—_ Class II Myosins/Myosin II Motor Proteins 


Class II myosins or myosin II motor proteins are expressed in all eukaryotes but not 
in plants. Nearly three dozen class IT myosins are reported throughout eukaryotic 
kingdom to date (Bagshaw 1993). At least one class II myosin is believed to be 
expressed in every eukaryotic cell. Based on their cell-type expressions and motor 
or tail domain sequences, class II myosins are further divided into four different 
subclasses. They are (1) yeast myosins, (2) Dictyostelium or Acanthamoeba myo- 
sins, (3) skeletal or cardiac or sarcomeric myosins, and (4) vertebrate smooth mus- 
cle or non-muscle myosins. Class II myosins are believed to be originated in 
ancestral eukaryotes unikonts with or without a single flagellum like amoebozoans, 
fungi, and holozoans (Richards and Cavalier-Smith 2005). While simple unicellular 
organisms like social amoeba (Dictyostelium) adopted a single class II myosin 
gene, complex multicellular organisms except fruit fly (Drosophila) acquired mul- 
tiples of them during evolution. Fifteen of the total 40 myosin genes present in the 
human genome encode class II myosins (MYH1, MYH2, MYH3, MYH4, MYH6, 
MYH7, MYH7B, MYH8, MYH9, MYH10, MYH11, MYH13, MYH14, MYH1S5, 
MYH16); however, not all of them are active (Berg et al. 2001). Skeletal myosin and 
cardiac myosin encoding genes are located on chromosome 17 and chromosome 
14 in humans, respectively. While sarcomeric myosin encoding gene is located on 
human chromosome 7, genes that encode smooth muscle myosins are located on 
chromosome 3 and 16. MYH11 located on chromosome 16 undergoes alternative 
splicing and encodes four distinct myosin II isoforms in human smooth muscle cells 
(Matsuoka et al. 1993). MYH8 is an important paralog of MYH11. MYH9, MYH10, 
and MYH14 genes located on different chromosomes encode myosin IIA, myosin 
IIB, and myosin IIC motor proteins, respectively. MYH9 located on chromosome 
22 undergoes alternate splicing to express two isoforms in the cochlea (Li et al. 
2008). MYH10 and MYH14 are located on chromosome 17 and 19, respectively. 
Both of these motor proteins express two tissue-specific splice variants (Kim et al. 
2005, 2008; Ma et al. 2006). Interestingly, myosin ITA, myosin IIB, and myosin IIC 
are expressed exclusively in non-muscle cells, hence non-muscle myosin II motor 
proteins (Golomb et al. 2004; Leal et al. 2003; Simons et al. 1991; Toothaker et al. 
1991). Myosin ITA, myosin IIB, and myosin IC are expressed in every human non- 
muscle cell with a few exceptions. However, the expression of these motor proteins 
depends on cell and tissue types (Golomb et al. 2004; Kawamoto and Adelstein 
1991). No tissue or cell type appears to express all three non-muscle myosin II 
motor proteins; however, many cell types express at least one or two of them under 
normal physiological conditions. The relative expression of myosin II motor pro- 
teins does not remain the same in all cell or tissue types. For instance, non-muscle 
myosin ITA and myosin IIB are expressed in endothelial and epithelial cells at simi- 
lar levels. However, myosin IIB and myosin IIC are expressed abundantly in ner- 
vous and lung tissue, respectively. Non-muscle myosin IIA is the only conventional 
myosin II motor protein expressed in the circulating platelets (Maupin et al. 1994). 
Thus, preferential expression of myosin II motor proteins in different cell types 
reflects their specialization in driving separate, dedicated, and probably 
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nonredundant cellular functions. Why doesn’t a single cell or tissue type express 
myosin ITA, myosin IIB, and myosin HC motor proteins not clearly understood? 
Perhaps, preferential expression of myosin II paralogs is necessary for maintaining 
different cell and tissue types. No significant amount of non-muscle myosin II 
motor protein is reported in the circulatory system or in any other body fluid under 
normal physiological conditions. However, presence of myosin ITA and myosin IIB 
is reported in the urine of patients with the hereditary Alport syndrome and other 
kidney diseases (Pohl et al. 2013). 

The remaining part of the present chapter is focused on the recent developments 
in our understanding of non-muscle myosin IIA, myosin IIB, and myosin IIC motor 
proteins. 


5.6 Subcellular Localization of Non-muscle 
Myosin II Motor Proteins 


Subcellular localization of a protein provides clues about its cellular functions in a 
given tissue. Non-muscle myosin II motor proteins associate mainly with actin fila- 
ments to generate intracellular contractile forces that are required for mediating 
cellular functions. Therefore, the subcellular localization of these molecular 
machines is linked to actin cytoskeleton dynamics that depend upon the state of a 
given cell. For instance, in a nonmotile quiescent cell, non-muscle myosin II motor 
proteins localize mostly in the cytoplasm, and some remain associated with actin 
filaments. However, these motor proteins are also found in the nuclei of proliferat- 
ing myoblasts (Rodgers 2005). Interestingly, these cytosolic myosin I motor pro- 
teins undergo transient localization to cleavage furrow during cytokinesis by 
unknown mechanisms. It is largely believed that the myosin II-generated contractile 
forces mediate the separation of two daughter cells during cytokinesis; however, 
their specific roles and underlying mechanisms are not clearly understood. Recent 
studies have shown that non-muscle myosin IIB is required for the completion of 
meiotic cytokinesis in male but not in female mice (Yang et al. 2012). Myosin IIB 
has been shown to prevent endomitosis or polyploidization during differentiation of 
megakaryocytes (Lordier et al. 2012). Myosin IIB also regulates enucleation pro- 
cess in erythrocytes that is akin to cytokinesis in other cells (Ubukawa et al. 2012). 

Cell migration has been viewed as an index of cellular life because of its impor- 
tance in maintaining growth, development, normal physiology, and immunity of a 
given organism. Cells during this complex process display frequent changes in their 
shapes and make attachment with matrix and neighboring cells. The cell cytoskel- 
eton plays critical roles in driving this essential cellular process. The actomyosin 
system, an essential component of the cytoskeleton, is extensively studied in migrat- 
ing cells. The actomyosin system generates contractile forces to mediate cell 
dynamics during migration. However, nature has limited this extraordinary ability 
of migration to animal and other eukaryotic cells but not to plant cells. Unlike ani- 
mal and other eukaryotic cells, plant cells cannot change their shapes, interact with 
matrix, and extend lamellipodia probably due to their rigid cell wall and absence of 
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class II myosins. Both mechanical and functional roles of myosin ITA and myosin 
IIB are extensively investigated in migrating cells for the past two decades. Many 
laboratories reported that myosin IIA and myosin IIB play specific roles in mediat- 
ing cell dynamics during migration. Cells initiate migration by extending their 
membrane in the form of lamellipodia perhaps as a part of strategy to probe environ- 
ment for favorable conditions and proper directions. During migration, cells display 
periodic extension and retraction of their lamellipodia by unknown mechanism. 
Interestingly, the cytosolic myosin IIA and myosin IIB translocate to the lamellipo- 
dia during cell migration. Both myosin IIA and myosin IIB display distinct localiza- 
tion and drive lamellipodia extension in opposite direction. On one hand, myosin 
IIB promotes lamellipodia and growth cone extension, and on the other, myosin IIA 
drives retraction of the lamellipodia during cell migration (Betapudi 2010; Brown 
and Bridgman 2003; Rochlin et al. 1995). However, the underlying mechanisms of 
myosin IIA and IIB transient localization to lamellipodia and their interacting pro- 
teins during cell migration are not clearly understood. The intracellular localization 
and specific roles of myosin IIC in driving cell migration are not clearly understood. 
Myosin II activity is also necessary for keratinocytes’ migration, a critical step in the 
re-epithelialization of human skin wound. Both myosin IIA and myosin IIB display 
transient localization to the lamellipodia during keratinocytes’ migration (Betapudi 
et al. 2010). Keratinocytes do not express myosin IIC motor proteins. 

In addition to their localization to lamellipodia, myosin II motor proteins display 
specific subcellular localization in quiescent cells. Both myosin ITA and myosin IB 
are found in the Golgi complex. Localization of myosin II in the nucleus and to 
nuclear membrane is yet to be uncovered. Myosin II motor proteins are not trans- 
membrane proteins; however, they localize to cell membrane in order to mediate 
internalization of the cell surface receptors including EGFR and CXCR4 (Kim et al. 
2012; Rey et al. 2007). Also, myosin II localizes to contractile vacuoles in water 
living microorganisms for unknown reasons. Contractile vacuoles are presumed to 
be homologues of lysosomes that are known to carry enzymes necessary for break- 
down of waste materials and cellular debris in higher organisms. Myosin II-mediated 
mechanical forces have been implicated in operating contractile vacuoles probably 
to expel additional water and toxic materials from amoeba in hypo-osmotic condi- 
tions (Betapudi and Egelhoff 2009). Myosin II motor proteins are also found on the 
outside of the cells infected with virus and implicated in mediating viral infection 
(Arii et al. 2010; van Leeuwen et al. 2002). Localization of myosin II is yet to be 
identified in extracellular microvesicles that are known to make intercellular com- 
munications; however, their activity is necessary for microvesicle secretion from 
endothelial cells treated with antiphospholipid syndrome autoantibodies (Betapudi 
et al. 2010). The intracellular localization and specific roles of myosin II in cells 
undergoing apoptotic death are not clearly understood; however, their mediation is 
presumed to be required for the execution of cell death (Flynn and Helfman 2010; 
Solinet and Vitale 2008; Tang et al. 2011). 

Thus, myosin II motor proteins display different subcellular localization to play 
specific roles in mediating cellular processes that are necessary for growth, develop- 
ment, and death. For instance, lower eukaryote like amoeba can survive with certain 
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developmental defects in the absence of myosin II motor protein (Xu et al. 1996). 
However, the absence of either myosin IA or myosin IIB or myosin IIC motor 
protein is lethal for mouse embryo growth and development (Conti and 
Adelstein 2008). 





5.7. Assembly of Non-muscle Myosin II Motor 
Protein in the Cells 


Unlike many other proteins with enzyme activity, no single polypeptide alone is 
known to exist and function as a molecular machine in any biological system. Every 
motor protein exists as a globular multiprotein complex and performs specific func- 
tions in the cells. Similar to multiple components used in building a man-made 
machine, many polypeptides are involved in the assembly of a functional biological 
molecular machine. For example, cells use six polypeptides that are non-covalently 
associated to assemble a functional non-muscle myosin II motor protein complex 
with an average molecular weight of 520 kDa. Though the underlying assembly 
mechanism is not clearly not known, two myosin heavy chain (MHC) polypeptides 
and four light chain polypeptides build a functional non-muscle myosin II motor 
protein in the cells. The MHC of myosin ITA, myosin IIB, and myosin IIC motor 
proteins are encoded by MYH9, MYH10, and MYH14 genes in humans, respec- 
tively. Each MHC with an average molecular weight of 220 kDa has isoelectric 
point of 6.8. Every myosin II motor complex comprises homodimer of a specific 
MHC; however, recent studies show the existence of heterodimers due to a signifi- 
cant sequence similarity. The light chain polypeptides encoded by different non- 
myosin genes are divided into essential light chains (ELC) and regulatory light 
chains (RLC) based on their specific functional roles in operating myosin II motor 
complexes in cells. Each myosin II motor complex comprises two essential light 
chains and two regulatory light chains. Based on their extraction methods, ELC and 
RLC are also called alkali and 5,5’-dithiobis/2-nitrobenzoate (DTNB) light chains, 
respectively. Comparing with the size of MHC polypeptide, both ELC and RLC 
proteins are very small with 16 and 22 kDa molecular weight, respectively. While 
MHC-homodimer forms backbone of the myosin II motor complex, light chains 
mostly involve in controlling motor activity. Myosin heavy chains are specific for 
each motor protein complex; however, both ELC and RLC are commonly found in 
all myosin II motor protein complexes. Although tissue-specific alternatively spliced 
MHC, ELC, and RLC polypeptides are expressed in higher eukaryotes, their spe- 
cific roles are not clearly understood to date. In addition to MHC, ELC, and RLC, 
no other protein has been identified as a component of non-muscle myosin II motor 
protein complex in the cells. 

The assembly process of myosin II motor protein complex occurs in the Golgi 
complex. This understudied assembly process is mediated by UCS (UNC-45/Cro1/ 
She4) chaperone and many other cellular proteins. As a part of this complicated 
assembly process, both myosin heavy and light chains undergo proper folding in 
order to build a functional myosin II motor protein complexes in the Golgi 
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apparatus (Gazda et al. 2013; Hellerschmied and Clausen 2014). This assembly 
process which is common for building myosin ITA, myosin IIB, and myosin IIC 
motor proteins remains elusive. Though the transcriptional regulation of ELC and 
RLC is poorly understood, the MHC expression of myosin ITA, myosin IIB, and 
myosin IIC is under the control of house-keeping promoter having no TATA ele- 
ment, a core sequence commonly found in the promoters of 24% human genes 
(Kawamoto 1994; Weir and Chen 1996). Thus, cells keep these multifunctional 
myosin II motor proteins readily available for mediating cellular functions. However, 
differential expression of MHC occurs in pathophysiological conditions. Serum and 
mitotic stimulants are known to induce differential expression of MHC (Kawamoto 
and Adelstein 1991; Toothaker et al. 1991). Differential expression of myosin II 
motor proteins has been attributed for the aggressive growth and metastasis of 
cancer cells. 








5.8 Operation of Myosin II Motor Protein in the Cells 


In response to external and internal cues, both heavy and light chains play specific 
roles in operating myosin II motor complexes in the cells. The heavy chains that 
form the backbone of the myosin II motor complex can be subdivided into a distinct 
head, neck, and tail domains. Each domain plays a specific role in building and 
operating these molecular machines in the cells. The N-terminus of the heavy chain 
starts with a globular head domain followed by a small neck region that is linked to 
a long alpha-helical tail domain. The N-terminal head domain carries ATPase activ- 
ity in order to release free energy in the cells. Thus, head domain is also called 
motor domain or functional engine of these molecular machines. In addition to 
ATPase activity, motor domain carries high binding affinity for actin filaments. 
Because of its high binding affinity for actin filaments, the operation of these bio- 
logical machines is restricted to actin filaments only despite cells building a com- 
plex network of intracellular tracks for trafficking and many other functions. Motor 
domain upon hydrolysis of ATP undergoes conformational change. This affects 
motor domain interaction with actin filaments, a key element of the cell strategy to 
generate mechanotransduction for performing various functions. The heavy chains 
of non-muscle myosin ITA, myosin IIB, and myosin IIC display a significant protein 
sequence similarity in their motor domains. However, all of them carry different 
binding affinities for actin filaments. Thus, myosin IIA, myosin IIB, and myosin IIC 
motor proteins mediate mechanotransduction with different energetic efficiencies in 
the cells. This probably suggests that myosin ITA, myosin IIB, and myosin IIC 
motor proteins are meant for playing different not redundant roles in the ells. 

The motor domain of class II myosins follows a short neck region that consists 
of two conserved IQ motifs QxxxRGxxxR); however, myosins of other classes 
may have more or less than two IQ motifs (Cheney and Mooseker 1992). The IQ 
motifs of the neck region form an amphiphilic uninterrupted seven-turn a-helix that 
has high binding affinity for myosin light chains and/or calmodulin in Ca*2- 
independent manner. This high binding affinity allows ELC and RLC to occupy the 
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first and second IQ motifs of the neck region, respectively. Both ELC and RLC by 
binding IQ motifs give stability to MHC. RLC also plays an additional role by offer- 
ing functional regulation of MHC. The IQ motifs of the neck region allow light 
chains to acquire either compact or extended conformation. Thus, the light chains 
attached neck region functions like a linker and lever arm for myosin I motor 
domain to amplify energy conversion into mechanical work. The length of the neck 
region is presumed to have direct impact on motor speed and energy transduction 
into mechanical work (Uyeda et al. 1996). Except class XIV Toxoplasma myosin A, 
the neck region of every myosin carries IQ motifs (Heintzelman and Schwartzman 
1997). The IQ motifs with approximately 25 amino acids in length are present in 
many other myosin heavy chains. This allows ELC to bind heavy chains of other 
myosins of class V, VI, and VII carrying IQ motifs; however, RLC binds exclusively 
to myosins of class II and XVIII (Chen et al. 2007; Tan et al. 2008). 

The neck region of the motor domain is followed by a long tail domain with vari- 
able amino acid sequences. This long tail domain with coiled-coil a-helices ends 
into a short non-helical tailpiece. Interestingly, the coiled-coil tail domains of two 
heavy chains undergo homodimerization in order to form a single rodlike structure. 
This allows myosin II motor complex to have two functional engines or motor 
domains with a single coiled-coil rodlike structure, hence double-headed myosin II 
motor protein or double-engined molecular machine. This double-engined myosin 
II motor protein exists in compact and linear forms in the cells. Myosin II attains a 
compact folded conformation due to a “proline-kink” at the junction of head and rod 
domains and attachment of its C-terminal tail domain to RLC (Craig et al. 1983; 
Onishi and Wakabayashi 1982; Trybus et al. 1982). Thus, depending upon cell 
requirement, myosin II can exist either in linear or compact conformations in the 
cells. Myosin II with compact folded structure sediments at 10S (Svedberg) hence 
10S form. Myosin II in 10S form displays high binding affinity for ADP and inor- 
ganic phosphate (Pi) and no ATPase activity (Cross et al. 1986, 1988). However, 
myosin IT in the linear elongated conformation attained upon its C-terminal tail end 
detachment from RLC becomes active with high binding affinity for ATP. The 
active myosin II motor complex is an elongated form sediment at 6S and therefore 
called 6S form (Trybus and Lowey 1984). Interestingly, myosin II motor proteins in 
the elongated form have the tendency to assemble into a highly ordered parallel and 
antiparallel thick filament due to intermolecular interactions between coiled-coil 
rod domains. Thus, myosin II activation and formation into a thick filament is one 
of the most important steps in the process of generating contractile forces in the 
cells. RLC plays critical roles in regulating filamentation by controlling linear and 
compact formation of myosin II motor proteins in the cells. Myosin II tail domains 
form large aggregates without proper filamentation in the absence of RLC (Pastra- 
Landis and Lowey 1986; Rottbauer et al. 2006). Thus, the RLC-controlled myosin 
rod filamentation and motor domain interaction with actin filaments are the most 
important aspects of cell strategy for converting ATP-released cellular free energy 
into force and mechanical work using non-muscle myosin II motor proteins. 
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5.9 Regulation of RLC Phosphorylation and Myosin II 
Motor Activity 


Despite carrying 60-80% sequence similarity at the amino acid level and same qua- 
ternary structure, non-muscle myosin ITA, myosin IIB, and myosin IIC paralogs 
appear to be diverged from a common ancestor more than 600 million years ago. 
However, in response to internal and external cues, these molecular machines 
undergo different regulatory mechanisms in the cells (Jung et al. 2008). The role of 
RLC in regulating myosin II motor protein activity is extensively studied in a wide 
variety of biological systems since its discovery in rabbit skeletal muscle myosins 
more than three decades ago (Casadei et al. 1984). RLC peptide does not exist alone 
in the cells; however, when remains are associated with the IQ motif of the neck 
region, it undergoes phosphorylation and dephosphorylation on its $1, S2,T9,T18, 
and $19 amino acids in order to turn on and turn off myosin II motor activity in the 
cells. RLC phosphorylation on $19 alone or on both T18 and $19 amino acids turns 
on myosin II motor activity with increased ATPase activity and elongated 6S con- 
formation that allows simultaneous assembly of myosin rods into thick filaments 
(Betapudi et al. 2006, 2010; Somlyo and Somlyo 2003; Wendt et al. 2001). However, 
phosphorylation of RLC has no effect on the affinity of myosin motor domain for 
actin filaments in the cells (Sellers et al. 1982). Dephosphorylation of T18 and S19 
amino acids or RLC phosphorylation on $1, S2, and S9 induces myosin II motor 
proteins to acquire compact 10S conformation with no myosin rods available for 
thick filamentation in the cells. Such site-specific phosphorylation of RLC turns-off 
myosin II motor protein activity in the cells. Thus, RLC by undergoing reversible 
phosphorylation on its certain specific amino acids plays a major role in regulating 
the activity of myosin II motor protein complexes in a wide variety of cell and 
tissue types. 

We now know enough about the regulatory mechanisms of RLC reversible 
phosphorylation in normal and abnormal physiological conditions. RLC revers- 
ible phosphorylation is tightly regulated by both myosin-specific phosphatase 
and a wide variety of protein kinases in the cells. In response to external and 
internal cues, a wide variety of protein kinases phosphorylate RLC in order to 
regulate the activity of myosin II motor proteins in the cells. Protein kinases 
including myosin light chain kinase (MLCK/MYLK), rho-associated coiled-coil- 
containing kinase (ROCK), citron kinase or citron rho-interactive kinase (CRIK) 
or serine/threonine-protein kinase 21 (STK21), leucine zipper interacting protein 
kinase (ZIPK) or death associated protein kinase 3 (DAPK3), and myotonic dys- 
trophy kinase-related CDC42-binding kinase (MRCK/CDC42BP) directly phos- 
phorylate T18 and S19 amino acids of RLC to activate myosin II in the cells. 
However, protein kinase C (PKC) phosphorylates S1, $2, and $3 amino acids of 
RLC to inactivate myosin II in cells undergoing cytokinesis (Nishikawa et al. 
1984). Interestingly, these RLC phosphorylating kinases not only display specific 
subcellular localizations but also respond to a wide variety of signaling pathways 
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in many settings. For instance, MLCK in response to Ca*2-calmodulin phosphor- 
ylates RLC to activate myosin II that is localized next to cell membrane 
(Totsukawa et al. 2004). This site-specific subcellular localization and MLCK 
activation are governed by many other upstream protein kinases such as p21 acti- 
vated kinase 1 (PAK1), Abl tyrosine kinase, Src, and arrest defective 1 in differ- 
ent cell types (Dudek et al. 2004; Sanders et al. 1999; Shin et al. 2008). A small 
GTP-binding protein RhoA activates ROCK and citron kinase in the central part 
of the cell. Shroom3, an actin binding protein, regulates ROCK subcellular local- 
ization and RLC phosphorylation in neuroepithelial cells (Haigo et al. 2003; 
Hildebrand 2005). Cell death and survival regulating DAPK3 displays nuclear 
localization and phosphorylates RLC in apoptotic cells in a Ca2*/calmodulin- 
independent manner (Murata-Hori et al. 1999). PKC regulates myosin II activity 
by phosphorylating RLC in the presence of Ca*2 and DAG (diacylglycerol) and/ 
or phorbol esters in mitotic cells (Varlamova et al. 2001). The subcellular site- 
specific RLC phosphorylation, dephosphorylation, and myosin II activation are 
tightly controlled by protein phosphatase 1 (PP1), a ubiquitously expressed myo- 
sin-specific phosphatase in the cells (Matsumura and Hartshorne 2008; Rai and 
Egelhoff 2011; Xia et al. 2005). Protein kinases and phosphatases that are 
involved in regulating RLC phosphorylation also phosphorylate other substrates 
in the cells. For instance, MLCK phosphorylates a proline-rich protein tyrosine 
kinase 2 (PYK2/PTK2B) or focal adhesion kinase 2 (FAK2) that are known to 
promote lung vascular endothelial cell permeability during sepsis (Xu et al. 
2008). ROCK directly phosphorylates LIM kinase and myosin phosphatase 
MYPTI1, a regulatory subunit of PP1 in many types of cells and tissues (Kimura 
et al. 1996; Leung et al. 1996). MYPT1 phosphorylation results into inactivation 
of PP1, and that leads to a significant increase in RLC phosphorylation and myo- 
sin IT activation in the cells. ZIPK, MRCK, and PKC are implicated in regulating 
MYPT1 phosphorylation in many cell and tissue types. PKC is also involved in 
phosphorylating MHC to regulate myosin II activity in the cells. Thus, a wide 
variety of protein kinases and phosphatases are involved in regulating RLC phos- 
phorylation in higher organisms. However, lower eukaryotes appear to have a few 
kinases and phosphatases to regulate RLC phosphorylation and myosin II activ- 
ity. For instance, MLCK-A is the only RLC phosphorylating kinase identified in 
Dictyostelium discoideum to date (Tan and Spudich 1990). Unlike MLCK in 
higher organisms, MLCK-A was shown to phosphorylate $13 of RLC in Ca*2- 
calmodulin-independent manner (Tan and Spudich 1990). RLC phosphorylation 
on $13 amino acid increases myosin I motor activity and regulates cell morpho- 
logical changes without affecting normal growth and development of Dictyostelium 
discoideum (Chen et al. 1994; Griffith et al. 1987; Liu et al. 1998; Matsumura 
2005; Uyeda et al. 1996). Although many cellular proteins are now known to 
undergo more than 200 distinct posttranslational modifications with structural 
and functional diversity, except reversible phosphorylation, no other posttransla- 
tional modification of RLC with a role either in regulating myosin II motor activ- 
ity or filamentation is reported to date. 
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5.10 Regulation of MHC Phosphorylation and Myosin Il 
Motor Activity 


Myosin heavy chain phosphorylation was first identified in macrophages more than 
three decades ago, and then its role was linked to myosin II filamentation and local- 
ization in lower eukaryotes such as Acanthamoeba and Dictyostelium discoideum 
(Barylko et al. 1986; Collins and Korn 1980; Egelhoff et al. 1993; Kuczmarski and 
Spudich 1980; Kuznicki et al. 1983; Pasternak et al. 1989; Trotter 1982; Trotter 
et al. 1985). We now know that MHC phosphorylation plays a major role in regulat- 
ing myosin II activity in a wide variety of cell and tissue types. Recent advanced 
phosphorylation prediction tools revealed multiple putative phosphorylation sites 
with their cognate protein kinases on the heavy chains of non-muscle myosin IIA, 
myosin IIB, and myosin IC. Although bioinformatic tools predicted multiple puta- 
tive phosphorylation sites in the motor, neck, and tail domains of MHC, only a few 
sites with their cognate protein kinases in the coiled-coil and non-helical tail regions 
are known to date. For instance, the MHC of non-muscle myosin IIA with 1960 
amino acids was predicted to undergo phosphorylation on 132 different residues; 
however, phosphorylation on only a few residues such as T1800, S1803, and 
$1808 in the coiled coil and $1943 in the non-helical tail regions were reported to 
date. While the MHC of myosin ITB with 1976 amino acids was predicted to undergo 
phosphorylation on 122 different residues, more than 135 putative phosphorylation 
sites were predicted on myosin IIC heavy chain with 1995 amino acids; however, 
phosphorylation of a few sites in the coiled-coil and non-helical tail regions of their 
C-terminal ends is identified to date (Dulyaninova and Bresnick 2013). It appears 
that MHC phosphorylation is regulated by tyrosine-, serine-, and threonine-specific 
kinases in the cells. Interestingly, majority of the putative phosphorylated amino 
acids in the motor domains of all three myosin heavy chains are tyrosine residues; 
however, their tail domains are heavily targeted by serine- and threonine-specific 
protein kinases. For instance, serine- and threonine-specific kinases like casein 
kinase 2 (CK2), PKC members, and alpha-kinase family members are now known 
to phosphorylate C-terminal ends of all three myosin heavy chains in normal physi- 
ological and pathological conditions (Clark et al. 2008a, b; Dulyaninova et al. 2005; 
Murakami et al. 1998; Ronen and Ravid 2009). PKC members phosphorylate $1916 
and $1937 residues of myosin ITA and myosin IIB, respectively (Conti et al. 1991; 
Even-Faitelson and Ravid 2006). PKC members also phosphorylate other multiple 
serine residues in myosin IIB and threonine residues in myosin IC coiled-coil 
regions (Murakami et al. 1998; Ronen and Ravid 2009). CK2 is known to phos- 
phorylate $1943 residue in the non-helical tail region of myosin IIA in vitro, and its 
role has been implicated in regulating myosin II assembly and localization in patho- 
logical conditions. However, recent in vitro studies including chemical inhibition 
and siRNA-mediated depletion of CK2 showed no significant change in $1943 
phosphorylation and breast cancer cell migration on fibronectin-coated surfaces 
(Betapudi et al. 2011). CK2 is also known to phosphorylate multiple serine and 
threonine residues in the coiled-coil and non-helical tail regions of non-muscle 


92 V. Betapudi 


myosin IIB and myosin IIC heavy chains in many cell types (Murakami et al. 1998; 
Ronen and Ravid 2009; Rosenberg et al. 2013). Therefore, the role of CK2 in regu- 
lating non-muscle myosin I-mediated cellular functions in certain specific patho- 
logical conditions cannot be ruled out. Protein kinases that target tyrosine residues 
of non-muscle myosin II motor domain are not identified. 

The role of alpha-kinase family members in regulating MHC phosphorylation is 
extensively studied in mammals and social amoeba. Alpha-kinase family members 
are serine-threonine-protein kinases that belong to a small and unique group of 
protein kinases with catalytic domains having no significant similarity at amino acid 
level with the catalytic domains of conventional protein kinases (De la Roche et al. 
2002; Middelbeek et al. 2010; Ryazanov et al. 1999; Scheeff and Bourne 2005). 
Unlike conventional protein kinases that are known to phosphorylate amino acid 
residues present in B-turns, loops, and irregular structures of their substrates, eukary- 
otic translation initiation factor 2 kinase (eIF2 kinase), the first member of the 
alpha-kinase family, shows unusual propensity to phosphorylate amino acids resi- 
dues located in the a-turns of its cellular substrates hence o-kinase (Luck- Vielmetter 
et al. 1990; Vaillancourt et al. 1988). However, recent in vitro phosphorylation stud- 
ies demonstrated that other members of alpha-kinase family also target amino acids 
of the non-alpha helical structures of their cellular substrates (Clark et al. 2008a; 
Jorgensen et al. 2003). Members of the alpha-kinase family are identified only in 
human and amoeba to date (Ryazanov et al. 1999; Scheeff and Bourne 2005). The 
human genome carries six different genes that encode alpha-kinases. Among them, 
transient receptor potential melastatin 6 (TRPM6) and transient receptor potential 
melastatin 7 (TRPM7) kinases are extensively studied to date. TRPM6 and TRPM7 
are bifunctional protein kinases that belong to a large protein family of transient 
receptor potential cation channels. These protein kinases play critical roles in sens- 
ing mechanical stress, pain, temperature, taste, touch, and osmolarity (Mene et al. 
2013; Middelbeek et al. 2010; Ramsey et al. 2006; Runnels 2011; Su et al. 2010). 
Both TRPM6 and TRPM7 kinases phosphorylate T1800, $1803, and $1808 resi- 
dues in the coiled-coil region of MHC to control myosin HA filamentation and 
association with actin filaments (Clark et al. 2008a, b). In addition, these multifunc- 
tional alpha-kinases phosphorylate many amino acids in the non-helical tail regions 
of myosin IIB and myosin IIC to control their filamentation and cellular roles. The 
MHC of myosin II is the only known substrate for TRPM6; however, TRPM7 can 
also phosphorylate annexin I or lipocortin I, a calcium/phospholipid-binding pro- 
tein that promotes membrane fusion and exocytosis. The members of alpha-kinase 
family are also extensively studied in Dictyostelium discoideum. Dictyostelium 
expresses MHCK-A, MHCK-B, MHCK-C, MHCK-D, and vWFA alpha-kinases. 
Except vWFA kinase, all of them prefer to phosphorylate T1823, T1833, and T2029 
residues in the tail region of myosin II in Dictyostelium (De la Roche et al. 2002; 
Egelhoff et al. 2005; Underwood et al. 2010; Yumura et al. 2005). Alpha-kinases by 
phosphorylating these sites control myosin II filamentation and play critical roles in 
regulating growth and development of Dictyostelium discoideum. Although VWFA 
kinase fails to phosphorylate myosin II heavy chain in vitro, but this special alpha- 
kinase regulates the expression and filamentation of myosin II heavy chain by 


5 Non-muscle Myosin Il Motor Proteins in Human Health and Diseases 93 


unknown mechanism (Betapudi et al. 2005). Unlike other alpha-kinase family 
members, vWFA kinase localizes to contractile vacuoles that are known to expel 
toxic metals and excess water from the cytoplasm of Dictyostelium discoideum. 
This special alpha-kinase appears to play critical roles in controlling the myosin 
II-mediated mechanical work implicated in regulating the dynamics of contractile 
vacuoles and survival of Dictyostelium discoideum in abnormal osmotic conditions; 
however, the underlying mechanisms are not clearly understood (Betapudi and 
Egelhoff 2009). It has been viewed that vWFA kinase protects Dictyostelium discoi- 
deum from osmotic shock death by regulating myosin II heavy chain expression and 
filamentation (Betapudi and Egelhoff 2009). 

No phosphatase that is specific to the heavy chains of non-muscle myosin II 
motor proteins is reported in mammals to date. However, the expression of myosin 
II heavy chain phosphatase has been reported in Dictyostelium discoideum (Murphy 
and Egelhoff 1999). Therefore, not much is known about the mechanism of dephos- 
phorylation of myosin II heavy chains in the cells. 





5.11 Non-muscle Myosin Il-Interacting Proteins 


In addition to protein kinases and phosphatases, several other cellular proteins are 
involved in regulating non-muscle myosin II motor proteins in the cells. Some of 
these myosin II regulating non-enzymatic cellular proteins are S100A4, lethal giant 
larvae (Lgl), myosin binding protein H, and S100P. These proteins interact directly 
with myosin II heavy chain to control phosphorylation and filament assembly in 
flies and mammals (Du et al. 2012; Ford et al. 1997; Hosono et al. 2012; Kriajevska 
et al. 1994; Vasioukhin 2006). The Lgl protein was initially identified as a tumor 
suppressor protein in fruit fly and then implicated in regulating myosin II activity to 
control epithelial cell polarization and asymmetric cell division in higher organ- 
isms. Lgl forms a complex with C-terminal ends of myosin II heavy chains in the 
cells. However, the stability of this Lgl-myosin II complex depends on MHC phos- 
phorylation by PKC in the cells (Betschinger et al. 2005; Kalmes et al. 1996; Plant 
et al. 2003; Strand et al. 1994). Thus, the Lgl protein by interacting with coiled-coil 
regions of the MHC controls myosin II filamentation and localization in proliferat- 
ing cells (Dahan et al. 2012; De Lorenzo et al. 1999). Deletion of the Lgl protein 
encoding gene located on the human chromosome 17 is implicated in the develop- 
ment of Smith-Magenis syndrome, a developmental disorder that is known affect 
many body parts, intellectual disability, and sleep disturbances (De Leernyder et al. 
2001; Smith et al. 1986). However, the role of mutant Lgl protein in controlling 
MHC phosphorylation and non-muscle myosin II cellular functions is not clearly 
understood. The metastasis factor mts1 or S100A4 or calvasculin that belongs to 
$100 family of calcium-binding proteins interacts with C-terminal ends of the MHC 
of non-muscle myosin II in the cells. Interaction of mts1 with C-terminal ends of the 
MHC of myosin ITA promotes phosphorylation on $1943 and disassembly of myo- 
sin II filamentation; however, the underlying mechanisms remain elusive to date 
(Badyal et al. 2011; Kiss et al. 2012; Li et al. 2003; Mitsuhashi et al. 2011). S100-P 


94 V. Betapudi 


or migration-inducing gene 9 protein (MIG9), another member of S100 family of 
calcium-binding proteins and a novel therapeutic target for cancer, is known to 
interact with MHC of non-muscle myosin II motor proteins in the cells. MIG9 is 
overexpressed in many cancer cells and is known to interact with the MHC of myo- 
sin IIA to induce disassembly of myosin II filamentation in migrating cells (Du 
et al. 2012). Myosin-binding protein H (MYBPH) is associated with measles dis- 
ease. MYBPH interacts with ROCK1 and MHC to control RLC phosphorylation 
and myosin II filamentation in migrating cells; however, the underlying mecha- 
nisms remain elusive to date (Hosono et al. 2012). It has been shown that RLC 
phosphorylation and myosin II filamentation are essential for myosin II activation 
and cell migration, but recent studies suggest that the unassembled myosin II with 
phosphorylated RLC regulates initiation of focal adhesion complexes formation and 
lamellipodia extension during cell migration (Shutova et al. 2012). This may sug- 
gest that non-muscle myosin II motor proteins perform cellular functions without 
undergoing filamentation. It would be interesting to know how cells coordinate 
regulation of RLC and MCH phosphorylation to control non-muscle myosin II fila- 
mentation and cellular functions. Tropomyosin with two-stranded alpha-helical 
coiled-coil protein is an integral part of the actin cytoskeleton system in the cells. 
This cellular protein interacts with myosin II filaments in muscle and non-muscle 
cells. Tropomyosin is implicated in regulating myosin II localization to plasma 
membrane and stress fiber formation (Bryce et al. 2003). Supervillin, an actin fila- 
ment binding and cell membrane-associated scaffolding protein, has been impli- 
cated in regulating non-muscle myosin II motor activity. Supervillin by interacting 
with MLCK controls RLC phosphorylation and myosin II activity in the cells 
(Takizawa et al. 2007). Thus, many cellular proteins at different levels are involved 
in regulating the activity of non-muscle myosin II motor proteins to mediate a wide 
variety of cellular functions. 





5.12 Non-muscle Myosin II Motor Proteins 
Predispose Humans to Diseases 


Plants can grow, develop, and live normal life without class II myosin motor pro- 
teins; however, other eukaryotes including human require these multifunctional 
molecular machines for their proper growth, development, and survival. Now we 
know that the MYH9 germline-ablated mice without the expression of non-muscle 
myosin IIA motor protein die on 6.5 embryonic day (E) because of defective cell-cell 
interaction and lack of polarized visceral endoderm (Conti et al. 2004). The MYH10 
germline-ablated mice without the expression of another non-muscle myosin II 
motor protein myosin IIB were able to survive until E14.5 due to brain and cardiac 
developmental defects (Tullio et al. 1997, 2001). Although the MYH14-ablated mice 
that express no myosin IIC motor protein were able to live with no obvious develop- 
mental defects till adulthood, however, these mice require the expression of another 
non-muscle myosin II motor protein myosin IIB (Ma et al. 2010). Now we have 
enough evidence for the occurrence of mutations, misregulation, deletion, and 
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alternative splicing of MYH9, MYH10, and MYH14. The consequences of these 
changes at genetic level are the onset and progression of a wide variety of pathologi- 
cal conditions in humans. Nearly four dozen mutations are reported in myosin IIA 
encoding gene MYH9 to date (Saposnik et al. 2014). Interestingly, some of these 
mutations in MYH9 have been implicated in the development of a large number of 
autosomal-dominant disorders such as May-Hegglin anomaly, Sebastian platelet 
syndrome, Bernard-Soulier syndrome, Fechtner syndrome, Epstein syndrome, and 
Alport syndrome. These myosin ITA-linked autosomal diseases that are often worsen 
in late life of a patient are collectively called MYH9-related diseases (MYH9RD) 
(Balduini et al. 2011; Burt et al. 2008; Kelley et al. 2000; Pecci et al. 2008). The 
MYHORD patients that carry R702C/H and R1165C/L mutations in the motor 
domain of non-muscle myosin IIA develop deafness, cataract, Dohle-like inclusions, 
nephritis, and thrombocytopenia with enlarged platelets in their middle age (De 
Rocco et al. 2013; Pecci et al. 2008, 2014). In addition, most MYH9RD patients 
develop renal diseases in their early adulthood. The circulating white blood cells of 
the MYH9ORD patients carry non-muscle myosin IIA clumps with no cellular func- 
tions. Patients carrying D1424H/N/Y, V1516M, E1841K, and R1933X mutations in 
the tail domain of non-muscle myosin IIA live normal life with no symptoms of 
clinical relevance (Pecci et al. 2010). Abnormal expression of non-muscle myosin II 
motor proteins has been implicated in rendering humans to diseases. For instance, 
overexpression of myosin IIA was thought to increase cancer cell migration and 
metastasis as well as lung and kidney tumor invasion (Derycke et al. 2011; Gupton 
and Waterman-Storer 2006; Xia et al. 2012); however, this hypothesis has lost appre- 
ciation because of recent reports of myosin IIA roles in the posttranscriptional stabi- 
lization of tumor suppressor protein p53 and repression of squamous cell carcinoma 
in mice (Schramek et al. 2014). More than a decade ago, a chimeric MYH9-Alk 
transcript formed by the fusion of MYH9 and ALK (anaplastic lymphoma kinase) 
was reported in anaplastic large cell lymphoma; however, its disease relevance is not 
clearly established to date (Lamant et al. 2003). Polymorphisms in the MYH9 and 
adjacent APOL1 (apolipoprotein L1) have been implicated in the development of 
nondiabetic chronic kidney disease in African-Americans (O’Seaghdha et al. 2011). 
Although mutations in MYH10 that have relevance to human diseases with any clini- 
cal symptom are not reported, recently an E908X de novo mutation was identified in 
patients with microcephaly, hydrocephalus, cerebral, and cerebellar atrophy. 
Polymorphisms in MYH10 and some other genes have been linked to the develop- 
ment of abnormal heart with enlarged left atrium which was reported in a Caribbean 
Hispanic patient (Wang et al. 2010a). No direct link has been established in between 
myosin IIB expression and disease development to date. However, we now have 
enough evidence for establishing an indirect link in between the expression of myo- 
sin IIB and progression of a wide variety of pathological conditions including mega- 
karyopoiesis, myocardial infarction, scar tissue formation, demyelination, and 
juvenile-onset neuronal ceroid lipofuscinosis (JCNL) or Batten disease (Antony- 
Debre et al. 2012). JCNL or Batten disease is a lysosomal storage disorder caused by 
mutations in CLN3 that encode a lysosomal membrane-binding chaperone that is 
known to bind non-muscle myosin IIB motor protein in the cells. Mutations in CLN3 
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not only inhibit its interaction with myosin IIB motor protein but also affect retro- 
grade and anterograde trafficking in the Golgi complexes (Getty et al. 2011). 
Although myosin IIB role is not clearly established, many patients with CLN3 muta- 
tions show symptoms of dementia, seizures, loss of vision, and psychomotor distur- 
bances (Cotman and Staropoli 2012). Abnormal regulation of non-muscle myosin 
IIC motor protein is also implicated in the development and progression of several 
diseases. For instance, mutations in MYH14 have been linked to the development of 
hoarseness, hereditary blindness (DFNA4), myopathy, and peripheral neuropathy 
(Choi et al. 2011; Donaudy et al. 2004). Expression of aberrant splicing products of 
MYH14 has been implicated in the development and progression of myotonic dys- 
trophy type | (DM1), a multisystem genetic disorder that is known to affect 1 in 
8000 people worldwide (Kumar et al. 2013; Rinaldi et al. 2012). 

Abnormal regulation of myosin [J-interacting proteins is also implicated in the 
development and progression of many diseased conditions in humans. Overexpression 
of ROCK and Mts! that are known to regulate myosin II phosphorylation and fila- 
mentation is implicated in causing enhanced cancer cell migration, an essential step 
in metastasis and invasion (Boye and Maelandsmo 2010; Kim and Adelstein 2011; 
Sandquist et al. 2006). Mutations in RLC are known to affect singing male courtship 
song in fruit flies (Chakravorty et al. 2014). Mutations in MYLK, another RLC phos- 
phorylating kinase, have been implicated in cancer development and progression 
(Greenman et al. 2007). Mutations in MYLK are linked to the development of 
Marfan syndrome and Ehlers-Danlos syndrome. Patients with these syndromes 
develop familial aortic dissections (FAD) or actual aortic tearing openings that may 
cause sudden death (Wang et al. 2010b). In addition, race-specific single nucleotide 
polymorphism variants of MYLK are implicated in the development and progression 
of asthma, acute lung injury, and sepsis (Flores et al. 2007; Gao et al. 2006, 2007). 
Mutations in TRPM6 that encodes an ion-channel kinase involved in phosphorylat- 
ing myosin II heavy chain are known to cause hypomagnesemia in patients with 
secondary hypocalcemia (Schlingmann et al. 2002; Walder et al. 2002). Abnormal 
regulation of another ion-channel kinase TRPM7 that is known to phosphorylate 
myosin II heavy chain has been linked to Guamanian amyotrophic lateral sclerosis 
and parkinsonian dementia (ALS/PD), various forms of neoplasia, hypertension, and 
delayed neuronal death following cerebral ischemia (Bates-Withers et al. 2011). 

Many pathogens are believed to manipulate and hijack these multifunctional 
molecular machines in order to promote their infection and pathogenesis. For 
instance, herpes simplex virus type | is believed to hijack myosin II motor protein for 
promoting its egression (Arii et al. 2010; van Leeuwen et al. 2002), murine leukemia 
virus for efficient infection (Lehmann et al. 2005), and Salmonella bacteria for pro- 
moting growth in macrophages (Wasylnka et al. 2008); however, the underlying 
mechanisms are not clearly understood to date. Another pathogen Kaposi’s sarcoma 
herpes simplex virus that is known to cause AIDS-related neoplasm is believed to 
manipulate non-muscle myosin II and E3-ubiquitin ligase c-Cbl-mediated signaling 
pathway to induce macropinocytosis as a part of mechanism to infect blood vessels 
(Sharma-Walia et al. 2010). Interestingly, certain pathogens like HIV-1 that is known 
to cause renal disease is believed to inactivate non-muscle myosin ITA motor protein 
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selectively in the kidney in order to escape clearance through urine (Hays et al. 
2012). Dengue virus type 2, a mosquito-borne single positive-stranded RNA virus, 
stimulates Rac1- and Cdc42-mediated signaling pathway to activate myosin II motor 
proteins for successful infection of host cells (Zamudio-Meza et al. 2009). Respiratory 
syncytial virus (RSV) that is known to cause runny nose, cough, headache, and 
severe respiratory tract infections is believed to activate non-muscle myosin II motor 
protein and other cytoskeletal proteins for rapid and efficient internalization during 
infection (Krzyzaniak et al. 2013). Non-muscle myosin II motor proteins play essen- 
tial roles in maintaining host defense system. However, certain pathogens including 
hepatitis C virus weakens host defense system by inducing development of autoanti- 
bodies against non-muscle myosin ITA motor protein perhaps as a part of escape 
strategy from host defense network (von Muhlen et al. 1995). 





5.13 Conclusions and Future Perspectives 


If it is not surprising to say the transfer of energy from molecule to molecule and one 
object to another is the key to life, efficiency undoubtedly is the essence of life. Here, 
we have natural molecular machines as the most efficient convertors of cellular free 
energy into biological work that is essential for the sustenance of life. Among all the 
biological motor proteins known to date, perhaps class II myosins, especially non- 
muscle myosin IA, myosin IIB, and myosin IIC motor proteins, have emerged as the 
main mechanotransducers of cellular free energy into work that is necessary for per- 
forming multiple biological processes ranging from birth to death in mammals’ life. 
Although the discovery of the first motor protein dates back to the nineteenth century, 
research done during the past 30 years has led us to understand much about the 
underlying mechanisms of several myosin I-mediated cellular processes in many 
biological systems. We now know enough about these molecular machines and have 
proven beyond doubt that murine life does not exist without the expression of non- 
muscle myosin II motor proteins (Conti and Adelstein 2008). Interestingly, many 
patients with abnormally regulated and mutated non-muscle myosin IIA, myosin IIB, 
and myosin IIC motor proteins are reported; however, no patient lacks these biologi- 
cal molecular machines. Extrapolation of such findings with caution may suggest 
that life in human and other mammals cannot exist without the expression of non- 
muscle myosin II motor proteins. Therefore, the emergence of myosin II encoding 
genes is probably a turning point in the evolution of mammals. Although nature 
chose not to give myosin II genes to plants during evolution, mammals have acquired 
Myh10, Myhil, and Myh14 genes with a significant homology in nucleotide 
sequence. It is largely accepted that the expression of all three functional non-muscle 
myosin II motor proteins in humans is required to maintain proper growth, develop- 
ment, and immunity. Each cell and tissue type in humans is known to display differ- 
ential expression of these myosin II paralogs; however, the reasons behind this 
differential expression remain elusive to date. Part of the reasons could be due to their 
specialization in mediating dedicated cellular functions that are specific to each cell 
and tissue type. However, this hypothesis will benefit from further understanding of 
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structural and posttranslational modifications of these molecular machines. We have 
made progress in identifying dozens of mutations in myosin II motors proteins and 
their regulating proteins; however, not much is known about many myosin IT muta- 
tions and their clinical relevance to date. Therefore, development of novel strategies 
for the management and diagnosis of MYH9RD patients are necessary (Althaus and 
Greinacher 2010). We need to improve our current understanding of MYH9RD 
patients in order to develop myosin I-based novel therapeutic approaches in future. 
Although it is not unequivocally proven yet, many modern cancer cell biologists 
believe that non-muscle myosin IJ motor proteins that are known to drive cell migra- 
tion and cytokinesis go awry in cancer and other diseased conditions. Overexpression 
of a particular myosin II motor protein has been implicated in mediating cancer pro- 
gression and metastasis; however, further understanding of the expression profile of 
each motor protein in every cancer type is necessary to design and develop myosin 
II-based therapeutics in future. Also, we need to increase our limited knowledge on 
the expression of chimeric and alternate splicing products of myosin II motor pro- 
teins in pathological conditions in order to develop treatment options. During the past 
30 years, we made a significant advancement in understanding of myosin II motor 
proteins in normal physiological conditions; however, we made very limited progress 
on understanding how pathogens hijack myosin II motor proteins for their efficient 
infection and propagation. Therefore, understanding what made these dedicated bio- 
logical molecular machines work for the interests of pathogens is no less than a chal- 
lenge to modern cell biologists. We still have limited knowledge on how myosin II 
motor proteins mediate release of extracellular microvesicles that are known to make 
intercellular communications and promote progression of many human diseases. 
Non-muscle myosin [-mediated mechanotransduction has been implicated in stem 
cell proliferation and differentiation (Chen et al. 2014); however, very little is known 
about the mechanical roles of myosin II paralogs. Therefore, further understanding of 
these biological machines will have a significant impact on stem cell-based tissue 
engineering, synthetic bioengineering, and therapeutic development. 
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Bioinformatics Databases: 
Implications in Human Health 


Leena Rawal, Deepak Panwar, and Sher Ali 


6.1 Introduction 


Understanding genetic variations in human genome and its contribution to pheno- 
typic change is one of the paramount goals in biology and medicine. The comple- 
tion of the human genome sequencing has made information readily available in 
large public domains, thereby allowing the researchers to identify and characterize 
naturally occurring variations in the human DNA sequence across individuals. 
Genome-wide association studies (GWAS) are one of the most widely used analy- 
ses for investigating variants. The advent of progress in genome sequencing tech- 
nologies such as high-throughput genotyping, next-generation sequencing, RNA 
expression, exome sequencing, and massively parallel sequencing has indeed accel- 
erated the exploration of genetic variations involved in human diseases. 
Advancements in these technologies have made it possible to record both moder- 
ate and proficient variations in the genomic architecture and to assess the events of 
transcripts in disease or control populations. The analysis of the data is usually 
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Fig.6.1 A roadmap for whole-exome and whole-genome sequencing projects. Post library prepa- 
ration, samples are sequenced on a certain platform. The following steps include quality assess- 
ment, read alignment against a reference genome, and variant identification. The annotation of the 
identified mutations is done to infer the biological relevance, and results can be displayed using 
dedicated tools. The detected mutations can be prioritized and filtered, followed by validation of 
the generated results in the lab (Adapted from Pabinger et al. 2014) 


decomposed into five distinct steps: (1) quality assessment of the raw data, (2) read 
alignment to a reference genome, (3) variant identification, (4) annotation of the 
variants, and (5) data visualization (Pabinger et al. 2014) (Fig. 6.1). 

As the genome evolves, the newly found genes with novel characteristics create 
phenotypic and genetic diversity in species. To ascertain the functions of the new 
genes, integrations of gene—gene interaction (GGI) networks across their homo- 
logues and ancestral genes are used to acquire their corresponding biological roles. 
Direct gene—gene and protein-protein interactions (PPIs) are one of the strongest 
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manifestations for the functional relation between genes and their interacting part- 
ners. In-depth understanding of the complete network of protein—protein interac- 
tions, i.e., the number, type, and distribution including the occurrence of key nodes 
in these networks, would open new avenues into the structures and properties of 
biological systems. Thus, bioinformatics and advanced computational tools/soft- 
ware have become important for the analysis of large amount of data excavated 
from such interactions. The purpose of this chapter is to make the readers aware of 
the bioinformatic tools and related softwares that indentify interactions within and 
between the genome (G), transcriptome (T), and phenome (P) that eventually have 
an impact on human health. 





6.2 Essence of Bioinformatics 


In the early 1990s, two new technological developments, high-throughput DNA 
sequencing and the Internet, allowed for an overwhelming explosion of biological 
data and its global dissemination. As a result of the former, whole-genome sequenc- 
ing was made feasible. In quick succession, the genomes of bacteria, Haemophilus 
influenza (Fleischmann et al. 1995) and Mycoplasma genitalium (Fraser et al. 1995), 
in 1995; an archaeon, Methanococcus jannaschii (Bult et al. 1996), and a yeast, 
Saccharomyces cerevisiae (Goffeau et al. 1996), in 1996; anematode, Caenorhabditis 
elegans (C. elegans Sequencing Consortium 1998), in 1998; the fruit fly, Drosophila 
melanogaster (Adams et al. 2000), in 2000; and finally a human, Homo sapiens 
(International Human Genome Sequencing Consortium 2004; Lander et al. 2001; 
Venter et al. 2001), in 2001 were sequenced. 

Since then thousands of genomes have been sequenced; the central databases such 
as GenBank at National Center for Biotechnology Information (NCBI; http://www. 
ncbi.nlm.nih.gov/)), DNA database of Japan (DDBJ; http://www.ddbj.nig.ac.jp/ ), 
and European Molecular Biology Laboratory (EMBL; http://www.ebi.ac.uk/embl/ ) 
are simultaneously updated. Based on the sequence and corresponding organism 
information, data were subjected to the specific sequence data repositories from 
where information on human genome can be retrieved and stored into the Genome 
browser (http://genome.ucsc.edu), Ensembl ( http://www.ensembL. org ), and Golden 
Path server ( http://genome.ucsc.edu/ ). Additionally, the information on novel genes 
can be salvaged from other resources such as UniGene and RefSeq (accessible at 
NCBJ (Mount and Pandey 2005). Thus, as the sequencing technology blossomed, so 
did the field of bioinformatics. 

Bioinformatics, often cited as computational biology, is the application of com- 
puter science, statistics, applied mathematics, and information technology to the 
study of biology and biological problems. Its interdisciplinary approach provides 
unique solutions to extract novel biological information from sequenced data analy- 
sis. The ability to process large data has lent bioinformatics use in different biologi- 
cal fields, including comparative sequence analysis, genomics, biological literature 
analysis, macromolecular sequence analysis, metagenomics, phylogenetic studies, 
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sequence motif analysis, and transcriptional regulation. Therefore, currently with 
the wide availability of sequence data, we can gain virtually impossible insights into 
evolution of genome, protein world, domain—domain cross talk, and interacting 
network. 





6.3 Genetic Variations and Databases 


With time and ever-changing environmental factors, the panorama of human genetics 
has evolved to target complex diseases. Although variants in the genome take many 
forms, majority of them ascend mainly from two types of mutation events. The most 
common variant type occurred from a single-base mutation, also termed as single-nucle- 
otide polymorphisms (SNPs). Experiential studies identified the common SNPs (>20% 
minor allele frequency) at 0.3—1 kb average intervals at chromosomes between any two 
individuals, which scales up to 5—10 million SNPs across the genome (Altshuler et al. 
2000). It has been estimated that 50,000—200,000 SNPs may be biologically important 
(Chanock 2001; de Bakker et al. 2005). SNPs occurring in the exons of genes that do not 
alter protein primary structure are called “synonymous”. SNPs in introns, regulatory, 
and gene-distant regions can also be functionally important, primarily by affecting gene 
regulation. A relatively common variant (MAF of 1—2%), G21210A, in the 3’ UTR of 
the prothrombin gene, F2, increases its expression, and carriers of the minor allele are at 
significantly increased risk for venous thrombosis (Poort et al. 1996). 

The other variant type occurs as the result of deletion or insertion of a nucleo- 
tides stretch, so-called insertion/deletion INDEL) polymorphisms. The most com- 
mon insertion/deletion events occur in repetitive sequence elements, namely, 
variable number tandem repeat (VNTR) and microsatellites. Nucleotide substitu- 
tions in the genome have the potential to directly contribute to disease pathogenesis, 
depending upon their occurrence. Large expansions of trinucleotide repeats can lead 
to genomic instability, the classic example being fragile X syndrome. A dinucleo- 
tide repeat (DG8S737) on chromosome 8 has shown to be strongly associated with 
prostate cancer in African-Americans (Cheng et al. 2008; Freedman et al. 2006), 
though its functional importance is yet to be established. Although modest varia- 
tions in STR and VNTR length impact on disease remain to be determined, evi- 
dence suggests that some may act as binding sites for nuclear proteins (Richards 
et al. 1993). Nucleotide substitutions in the protein-coding portions of genes some- 
times result in the premature insertions of codons causing the termination of protein 
translation. These often become alleles that are effectively null as their transcribed 
mRNA is rapidly degraded by nonsense-mediated decay (Lykke-Andersen 2001). 
Additionally, gross chromosomal aberrations, for instance, deletions, inversions, or 
translocations of large segments of DNA, have been associated with several clini- 
cally characterized genomic syndromes. Few polymorphisms show direct impact by 
creating deleterious phenotypes. However, large deletions or duplications can be 
quantified only through intensive cytogenetic methods (Gratacos et al. 2001). 

In bioinformatics research, databases are primary resources for researchers to 
retrieve the sequence data, which may collaboratively pool the data from other 
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Table 6.1 Databases for mining different human genetic variations 



















































































Databases Web links 

Mutation databases 

OMIM http://www.ncbi.nlm.nih.gov/Omim/ 

HGMD http://www.hgmd.org 

GDB mutation way station http://www.centralmutations.org/ 

HUGO mutation database initiative http://www.genomic.unimelb.edu.au/mdi/ 

Central databases (SNPs and mutations) 

HGV base http://hgvbase.cgb.ki.se/ 

Sequence variation database (SRS) http://srs.ebi.ac.uk/ 

dbSNP http://www.ncbi.nlm.nih.gov/SNP/ 

The SNP consortium (TSC) http://snp.cshl.org/ 

Genetic marker maps (microsatellites, STSs other markers) 

Marshfield maps http://research.marshfieldclinic.org/genetics/ 

Genome database (GDB) http://www.gdb.org 

dbSTS http://www.ncbi.nim.nih.gov/STS/ 

UniSTS http://www.ncbi.nlm.nih.gov/genome/sts/ 

Somatic and nonnuclear mutation databases 

MitoMap http://www.gen.emory.edu/mitomap.html 

Mitelman Map http://cgap.nci.nih.gov/Chromosomes/Mitelman 

Gene-orientated SNP and mutation visualization 

LocusLink http://www.ncbi.nim.nih.gov/LocusLink/ 

PicSNP http://picsnp.org 

Protein mutation database http://www.genome.ad.jp/htbin/ , http://pmd.ddbj.nig. 
ac.jp/~pmd/ 

Go!Poly http://61.139.84.5/gopoly/ 

GeneLynx http://www.genelynx.org 

SNPper http://bio.chip.org:8080/bio/snpper-enter 

GeneSNPs http://www.genome.utah.edu/genesnps/ 

GAP SNP database http://Ipgws.nci.nih.gov/ 





repositories into single available databank. The significant databases to mine the 
human genomic variations are enlisted in Table 6.1. These repositories are large 
reservoirs of information encompassing the genetic variation-causing diseases in 
the human genome. Concomitant analysis of the information retrieved through 
these databases could lead to a better understanding of disease phenotypes/genotype 
and their relationship with mutations. 


6.4 Genome-Wide Association Studies (GWAS) 


In 2007, the modern complex genetic era of GWAS began, and since then it has 
been tremendously successful in implicating several novel disease associations in 
many common complex diseases (Wellcome Trust Case Control Consortium 2007). 
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The completion of the human genome (Lander et al. 2001) and HapMap projects 
(The International HapMap Consortium 2003) coupled with advances in high- 
throughput genotyping and innovative sequencing technology has resulted in an 
explosion of genome-wide association studies (GWAS). According to the NIH 
(National Institute of Health, USA), GWAS have allowed researchers to study the 
genetic variations across the human genome and their association between different 
genetic markers, phenotypes, and different disease conditions. 

In accordance with the common disease-common variant (CD-CV) hypothesis, 
GWAS is based on the fact that common variants (SNPs) within alleles in the 
common population will decipher much of the heritability of common diseases 
(Reich and Lander 2001; Schork et al. 2009). Validation and analyses of the CD-CV 
hypothesis provide an insight into the genetic makeup of common diseases, e.g., 
rheumatoid arthritis, type 2 diabetes, or hypertension, that may have been 
contributed from multiple alleles. If common variants show a minor effect but 
common diseases have strong inheritance in families, then disease must be inclined 
by multiple genetic factors. However, the frequency of each allele can vary between 
groups of individuals, and here is a possibility that the common variations can 
increase an individual’s susceptibility to a disease. GWAS is a pioneering study that 
took advantage of favorable genetic and technological advances to uncover several 
novel disease associations with SNPs. As of November 16, 2013, GWAS catalog 
maintained by the National Human Genome Research Institute (NHGRI) docu- 
mented 11,907 SNPs and 940 traits with 15,052 disease associations. Thus, identi- 
fication of many novel genes has uncovered pathways that are implicated with 
disease etiology, leading to deeper insights on disease mechanisms. To help the 
research community in finding relevant publications and to further explore the 
reported variants, NHGRI has established and maintained the NHGRI GWAS 
Catalog ( http://www.genome.gov/26525384 ), an online, regularly updated data- 
base of single-nucleotide polymorphism (SNP)-trait associations from GWAS. A 
bioinformatics tool, GWAS Integrator, offers a robust search capacity and a set of 
data mining functions by integrating information from the NHGRI GWAS 
Catalog, with data from other established bioinformatics resources including 
HapMap ( http://hapmap.ncbi.nlm.nih.gov/ ), the Human Genome Epidemiology (HuGE) 
Navigator ( http://www.hugenavigator.net/ ), SNP Annotation and Proxy Search (SNAP) 
( http://www.broadinstitute.org/mpg/snap/ldsearch.php ), and University of California 
Santa Cruz (UCSC) genome browser ( http://genome.ucsc.edu/cgi-bin/hgGateway ). 

Among many achievements of GWAS, the most significant ones are the usage 
of data mining software to model complex genotype—phenotype relationships and 
expand the networking among biologists and bioinformaticians and enable the bio- 
logical database to pave the way for genetic association studies. A roadmap sug- 
gesting the development of information analysis of GWAS has been shown in 
Fig. 6.2. Although GWAS contributed to many SNPs and variant discoveries, 
genetic basis of most common diseases still remains unexplored. One possible 
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Fig. 6.2 Bioinformatics analyses of GWAS data. The use of filter and wrapper algorithms along 
with computational modeling approaches is recommended in addition to parametric statistical 
methods. Biological knowledge in public databases has a very important role to play at all levels 
of the analysis and interpretation (Adapted from Moore et al. 2010) 


clarification for this “missing hereditability” is that old GWAS methods have 
focused on one SNP at a time and failed to detect the heterogenetic complexity of 
many genotype—phenotype relationships and gene—gene and gene—environment 
interactions. GWAS has been used in many complex disease association, for 
instance, Crohn’s disease (Duerr et al. 2006; Libioulle et al. 2007), T1D and rheu- 
matoid arthritis (RA) (Wellcome Trust Case Control Consortium 2007), and mul- 
tiple sclerosis (MS) (De Jager et al. 2009). An overview of some GWAS-specific 
tools used for detection of nucleotide variant identification/annotation and data 
visualization is given in Table 6.2. Thus, advancements in GWAS will aid in iden- 
tification of new genetic associations that will eventually help the researchers to 
use the information to develop better strategies to detect, treat, and prevent the 
diseases. 
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Table 6.2 Tools for nucleotide variant identification and annotation 





Tool 


Descriptive feature 


Tools for variant identification 


Web link 





Germline and somatic variations 





CRISP 


Comprehensive Read analysis for Identification 
of Single-Nucleotide Polymorphisms (SNPs) 
from Pooled sequencing (CRISP) that is able to 
identify both rare and common variants by using 
two approaches: comparing the distribution of 
allele counts across multiple pools using 
contingency tables and evaluating the probability 
of observing multiple non-reference base calls 
due to sequencing errors alone 


http://polymorphism. 
scripps. 
edu/%E2%88%BCvbansal/ 
software/CRISP/ 





Dindel 


A Bayesian approach for calling small 
(<50 nucleotides) insertions and deletions 
from short read data 





GATK 


Genome Analysis Toolkit (GATK), a structured 
programming framework designed to ease the 
development of efficient and robust analysis 
tools for next-generation DNA sequencers using 
the functional programming philosophy of 
MapReduce 


http://www. broadinstitute. 
org/gsa/wiki/index.php/ 
The_Genome_Analysis_ 
Toolkit 





SAM tools 


The Sequence Alignment/Map (SAM) format is 
a generic alignment format for storing read 
alignments against reference sequences, 
supporting short and long reads (up to 128 Mbp) 
produced by different sequencing platforms. It is 
flexible in style, compact in size, and efficient in 
random access 


http://samtools.sourceforge. 
net 





SNVer 


SNVer (single-nucleotide variant caller/seeker), a 
statistical tool for detecting variants in analysis 
of NGS data, employing a binomial—binomial 
model to test the significance of observed allele 
frequency against sequencing error 


http://snver.sourceforge.net/ 





VarScan 


Detection of somatic mutations and copy number 
alterations (CNAs) in exome data 


http://varscan.sourceforge. 
net 





CVWN identification 











CNVnator Tool for identifying, genotyping, and http://sv.gersteinlab.org/ 
characterizing CNVs 

CONTRA Copy number targeted resequencing analysis http://contra-cnv. 
(CONTRA), for targeted resequencing data such | sourceforge.net/ 
as those from whole-exome capture data 

ExomeCNV_ | ExomeCNV is an R package tailored to detection 





of copy number variants (CNV) and loss of 
heterozygosity (LOH) from exome sequencing 
data. It exploits the unique discrete feature of 
exon definitions and incredible cross sample 
consistency of depth of coverage. ExomeCNV 
is most suitable when paired samples 

(e.g., tumor—normal pair) are available 
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Tool 


Descriptive feature 


Web link 





RDXplorer 


Tool for copy number variant (CNV) detection in 
whole human genome sequence data using read 
depth (RD) coverage 


http://rdxplorer. 
sourceforge.net/ 





Sequence variation identification 

















BreakDancer | Tool can detect deletions, insertions, inversions, 
and intra- and inter-chromosomal translocations 
and computes the copy number 
BreakPointer | A fast algorithm to locate breakpoints of https://github.com/ruping/ 
structural variants (SVs) from single-end Breakpointer 
reads produced by next-generation sequencing 
CLEVER Authors provide structured documentation http://clever-sv.googlecode. 
com 
GASVPro Software to detect SVs from paired-end mapping | http://compbio.cs.brown. 
data edu/projects/gasv/ 
SVMerge SVMerge integrates calls from several existing http://svmerge.sourceforge. 


SV callers: BreakDancerMax, Pindel, 
RDXplorer, CnD, and SECluster 


net/ 





Tools for variant annotation 




















ANNOVAR __ | Integrated tool providing gene annotation at http://www. 
single-nucleotide variants (SNVs) and insertions/ | openbioinformatics.org/ 
deletions, such as examining their functional annovar/ 
consequence on genes, inferring cytogenetic 
bands, reporting functional importance scores, 
and finding variants in conserved regions 

AnnTools Provides a set of helper tools for custom http://anntools.sourceforge. 
annotation net/ 

NGS-SNP Provides rich annotations for SNPs identified by | http://stothard.afns. 
the sequencing of whole genomes from any ualberta.ca/downloads/ 
organism with reference sequences in Ensembl NGS-SNP/ 

SeattleSeq The SeattleSeq Annotation server provides http://snp.gs.washington. 

Annotation annotation of single-nucleotide variants (SNVs) edu/ 
and small indels, both known and novel. This SeattleSeqAnnotation138/ 
annotation includes dbSNP rs ID, gene names 
and accession numbers, variation functions (e.g., 
missense), protein positions and amino acid 
changes, conservation scores, HapMap 
frequencies, PolyPhen predictions, and clinical 
association 

SNPeffect SNPeffect primarily focuses on the molecular http://snpeffect.switchlab. 

4.0 characterization and annotation of disease and org/ 
polymorphism variants in the human proteome 

VARIANT VARIant ANalysis Tool (VARIANT) reports http://variant. bioinfo.cipf.es 





information on the variants found that include 
consequence type and annotations taken from 
different databases and repositories 
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6.5 Gene-Gene Interactions 


Complex diseases have adversely affected human health throughout the world. 
Several studies have reported that the complex diseases are caused by multiple loci 
(Kraft and Cox 2008; Seng and Seng 2008). Following the identification of disease- 
susceptibility polymorphisms by GWAS by using the standard single-locus analy- 
ses, the process is shifting towards the detection of gene interactions with another 
gene or other cellular factors. 

The gene-environment interactions (G x E) are likely to affect complex phenotypes. 
The individuals with predisposing genetics are more likely to develop a disease when 
exposed to an altered environment than individuals exposed to the same environment, 
without predisposing genetics (Cambien et al. 1992; Jacques et al. 1996). In addition to 
interactions among genes and environment, interactions among different genetic loci 
(G x G) can also influence disease risk. The G x G are defined as epistatic when the 
allelic variations of one gene alter the effect of variations of another gene (Musani et al. 
2007). Epistasis has been identified in human diseases and its role in public health has 
been highlighted (Small et al. 2002; Howard et al. 2002). On the other hand, if a genetic 
factor functions primarily through a complex mechanism that involves multiple genes 
and environmental factors, the effect might be missed when the gene is examined in 
isolation without allowing for its potential interactions with other unknown factors. 
Therefore, it is important to explore the gene—gene and/or gene—environment interac- 
tions in order to recognize the gene etiology of complex diseases. 

A number of methods and several computational tools have been developed for 
the gene prioritization based on sequence-based features, gene-expression data, and 
functional annotation (Table 6.3). Also, many other approaches have been antici- 
pated in the identifications of gene-gene and/or gene—environment interactions. 
Much of the known information on gene—phenotype association is distributed in 
various databases that explore variant-filtering strategies (Table 6.4). The outcomes 
from these analyses could lead to new genetic findings that account for the heritabil- 
ity of human diseases as well as provide novel insights about underlying genetic 
etiology through bench science research and clinical applications. 


Table 6.3 Web resources and algorithms for gene—gene interaction and gene prioritization 





Tools Description | Web link 
CAESAR CAndidatE Search And Rank (CAESAR) is a | http://visionlab.bio.unc. 
tool for prioritizing candidate genes for complex | edu/caesar/ 
traits. CAESAR exploits the knowledge of 
complex traits in literature by using ontologies 
to semantically map the trait information to 
gene and protein centric information from 
| several different public data sources | 
CANDID CANDID is a genome-wide candidate | https://dsgweb.wustl.edu/ 
identification and prioritization algorithm that —_| hutz/candid.html 


| uses a several heterogeneous data sources, some 
of them chosen to overcome bias due to 

| previous knowledge of the user or against 
poorly characterized genes 
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Tools 


DiseaseNet 


Description 


DiseaseNet is a platform for analyzing 
disease-associated protein-protein interaction 
(PPI) networks. This tool can be used to obtain 
disease-associated gene information and the 
cross talk with other diseases through PPI 
networks. Disease—gene network is defined as 
the network that is constructed from the known 
interactions between the genes implicated in 
different diseases 


Web link 


http://bioschool.iitd.ac.in/ 
DiseaseNet/ 





ENDEAVOR 


G2D 


ENDEAVOUR is a web resource for the 
prioritization of candidate genes that uses a 
training set of genes known to be involved in a 
biological process of interest. The approach 
consists of (1) inferring several models (based 
on various genomic data sources), (2) applying 
each model to the candidate genes to rank those 
candidates against the profile of the known 
genes, and (3) merging the several rankings 
into a global ranking of the candidate genes 


G2D (genes to diseases) is a web resource for 
prioritizing candidates genes for inherited 
diseases. It uses three algorithms based on 
different prioritization strategies. Candidate 
genes are prioritized according to their possible 
relation to an inherited disease using a 
combination of data mining on biomedical 
databases and gene sequence analysis. The 
input to the server is the genomic region 

where the user is looking for the disease- 
causing mutation, plus an additional piece 

of information depending on the algorithm used 


http://www.esat. 
kuleuven.be/endeavour 


http://www.ogic.ca/ 
projects/g2d_2/ 





GeneDistiller 


GeneDistiller provides knowledge-driven, fully 
interactive, and intuitive access to multiple data 
sources. It uses information from various data 
sources such as gene—phenotype associations, 
gene-expression patterns, and protein-protein 
interactions 


http://www.genedistiller. 
org/ 





Gene Prospector 


Gene Prospector is a web-based application that 
selects and prioritizes potential disease-related 
genes by using a highly curated and updated 
literature database of genetic association 
studies. Gene Prospector provides an online 
gateway for searching for evidence about 
human genes in relation to diseases and other 
phenotypes 


http://www. 
hugenavigator.net/ 
HuGENavigator/ 
geneProspectorStartPage. 
do 





Gene Wanderer 





GeneWanderer is a candidate disease—gene 
prioritization algorithm based on protein— 
protein interaction 





http://compbio.charite.de/ 
genewanderer 
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Tools 


Gentrepid 


Description 


Public candidate disease—gene prediction 
system that associates genes with specified 
phenotypes using genetic and bimolecular data. 
Gentrepid draws on two gene clustering 
methods to make candidate gene predictions: 
the Common Pathway Scanning (CPS) and 
Common Module Profiling (CMP) approaches 


Web link 
http://www.gentrepid.org/ 





MedSim 


MimMiner 


MedSim ranks candidate genes for a particular 
disease based on functional comparisons 
involving the gene ontology. MedSim uses 
functional annotations of known disease genes 
for assessing the similarity of diseases as well 
as the disease relevance of candidate genes 


MimMiner is a system for text-mining analysis 
of the human phenome that classifies human 
disease phenotypes from OMIM and phenotype 
similarities for similar human disease 
phenotypes at multiple levels of gene 
annotations 


http://funsimmat.bioinf. 
mpi-inf.mpg.de/qf.php 


http://www.cmbi.ru.nl/ 
MimMiner/cgi-bin/main.pl 





MORPHIN 


Prioritizes the most relevant human diseases for 
a given set of model organism genes, 
potentially highlighting new model systems for 
human diseases and providing context to model 
organism studies 


http://www.inetbio.org/ 
morphin 





PGMapper 


PhenoPred 


PGMapper is a software tool for automatically 
matching phenotype to genes from a defined 
genome region or a group of given genes by 
combining gene function information from the 
OMIM and PubMed databases. PGMapper is 
currently available for candidate gene search 
independently for human, mouse, rat, zebrafish, 
and 12 other species 


PhenoPred is an algorithm for detecting gene— 
disease associations based on a protein—protein 
interaction network, known gene—disease 
associations, protein sequences, and protein 
functional information at the molecular level. 
PhenoPred is supervised meaning that first each 
protein is mapped onto the spaces of disease 
and functional terms. In a second step, a 
support vector machine model is trained and 
used to detect gene—disease associations 


http://www. 
genediscovery.org/ 
pgmapper/index.jsp 


http://www.phenopred.org 





PRINCE 





PRloritizatioN and Complex Elucidation 
(PRINCE) is a network-based approach for 
predicting causal genes and protein complexes 
that are involved in a disease of interest. 
PRINCE generalizes the standard network- 
based approaches by both considering the 
network signal in a global manner and going 
beyond single genes to the modules that are 
affected in a given disease. 
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Tools 
SNPs3D 


Description 


specified disease 


SNPs3D is a web resource, coupled to a 
database that provides and integrates as much 
information as possible on disease—gene 
relationships at the molecular level. The 
SNPs3D resource has three primary modules. 
One of them identifies which genes are 
promising candidates for involvement in a 
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Web link 
http://www.SNPs3D.org 





ToppGene 





of training genes 


ToppGene is a gene prioritization method that 
combines mouse phenotype data with human 
gene annotations and literature. It ranks 
candidate genes based on a similarity score for 
each annotation of each candidate by 
comparing to the enriched terms in a given set 





http://toppgene.cchmce.org/ 





Table 6.4 Data sources and gene prioritization tools depending upon the data type 














Data type Data content Possible sources Tools 
Experiment, | Linkage, association, User provided CAESAR, CANDID, 
observation | pedigree, relevant texts, ENDEAVOR, G2D, 
and other data Gentrepid, GeneDistiller, 
PGMapper, PRINCE, 
ToppGene 
Sequence, Sequence conservation, SCOP, Pfam, CAESAR, CANDID, 
Structure, exon number, coding PROSITE, UniProt, ENDEAVOR, 
metadata region length, known Entrez Gene, G2D,Gentrepid, 
structural domains and ENSEMBL, InterPro, | GeneDistiller, Gene 
sequence motifs, LocDB, GeneCards, Prospector, MedSim, 
chromosomal location, PredictProtein MimMiner, PGMapper, 
protein localization, and PhenoPred, SNPs3D, 
other gene-centered ToppGene 
information and 
predictions 
Pathway, Disease—gene KEGG, STRING, CAESAR, CANDID, 
PPI, genetic | associations, pathways Reactome, DIP, DiseaseNet, ENDEAVOR, 
linkage, and gene-gene/protein—_ | BioGRID, GEO, G2D, Gentrepid, 
expression protein interactions ArrayExpress, GeneDistiller, 
predictions, and ReLiance GeneWanderer, MedSim, 
gene-expression data PGMapper, PhenoPred, 
PRINCE, SNPs3D, 
ToppGene 
Nonhuman Information about related | OrthoDisease, CAESAR, CANDID, 
data genes and phenotypes in | OrthoMCL, MGD, ENDEAVOR, 
other species Pathbase GeneDistiller, Gene 
Prospector, 











GeneWanderer, MedSim, 
SNPs3D, ToppGene 
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Data type | Data content | Possible sources Tools 
Ontologies Gene, disease, GO, DO, MPO, HPO, | CAESAR, ENDEAVOR, 
phenotype, and anatomic | eVOC G2D, GeneDistiller, 
ontologies MedSim, PhenoPred, 
Prioritizer, SNPs3D, 
| ToppGene 
Mutation Information about | dbSNP, PMD, CAESAR, CANDID, 
associations | existing mutations, their GAD,DMDM, SNAP, | Gene Prospector, 
and effects _| functional and structural PolyDoms, SNPdbe, GeneWanderer, SNPs3D, 
| effects and their SNPselector, RAVEN, | SUSPECTS 
| association with diseases, | SNPeffect, PHD-SNP, 
| and predictions of | Mutation@A Glance, 
functional or structural | PromoLign, SIFT, 
effects for the mutations | PolyPhen, PupaSNP 
| in the gene in question finder, FASTSNP 
6.6 Protein-Protein Interactions 


In the recent era of science, the need for exploring the structure and function of a 
protein has been an important field in biological research. Proteomics, a large-scale 
analysis of proteins, includes identification, expression, and functional characteriza- 
tion of proteins, their interactions, and other pre-, co-, and posttranslational modifi- 
cations, if any. The main objective of proteomics is to assess the biological processes 
and analyze the protein—protein interaction networks (Blackstock and Weir 1999). 
All biological processes are orchestrated and regulated by proteins, and they accom- 
plish their functions by interacting with other proteins and forming new protein— 
protein interactions (PPIs). Alterations in the PPI network can result in the 
development of a disease phenotype. 

With the advent of high-throughput proteomics and emergence of computational 
biology, the potential focus of research has moved from analysis of individual pro- 
tein to monitoring protein-protein interactions at an organism level. Current tech- 
nology for characterization and identification for proteins from a cell encompasses 
two-dimensional gel electrophoresis for separation and tandem mass spectrometry 
for identification of the protein (Dove 1999). Other experimental methods such as 
Co-immunoprecipitation (CO-IP), phage display, and yeast two-hybrid system are 
often employed to detect interactions between cellular proteins (Fields and Song 
1989; Bartel and Fields 1997; Uetz et al. 2000). Unfortunately, these experimental 
approaches are tedious and potentially inaccurate and provide information only 
about interaction and nothing about its structural conformation (Enright et al. 1999). 
To overcome this, advanced computational algorithms in structural bioinformatics 
are required. Computational predictions of protein—protein interactions combine 
bioinformatics approach and structural biological studies, to ascertain interactions 
at different levels. Such methods begin with a structural representation of each of 
the constituent proteins (either experimentally solved structures or comparative 
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models) and in the process attempts are made to predict whether or not two proteins 
will interact. 

In PPI networks, nodes represent proteins (and, by extension, genes) and interac- 
tions represent physical protein-protein interactions. This includes a range of inter- 
action types, such as hormone-receptor interactions, kinase—substrate interactions, 
or the stable bond between proteins in the same complex. The human PPI network 
consists mostly of proteins with very few interactions and a few proteins with a high 
number of interactions. By comparison, in a randomly connected network, most 
nodes have an average number of interactions, while some have many and some 
have few. Although researchers can infer about the interaction by analyzing the 
sequence of a gene/protein (Marcotte et al. 1999; Lu et al. 2002; Valencia and Pazos 
2002; Salwinski and Eisenberg 2003), the structural information is needed to 
explore the possible function and interaction network building among the proteins. 
Elucidating the structural complexes of proteins may provide interaction details that 
are critical for understanding the molecular processes (LoConte et al. 1999; 
Chakrabarti and Janin 2002; Salwinski and Eisenberg 2003 Kortemme and Baker 
2004). According to the study by Aloy and Russell (2004), about 10,000 different 
types of PPIs are stored in Protein Data Bank (PDB), while the number is approxi- 
mately 2000 for nonredundant interacting proteins. Based on these calculations, 
current frequency of structure determination would take 20 years for complete elu- 
cidation of PPI network (Aloy and Russell 2004). Neither the interactions have been 
interpreted in the context of genotype—phenotype correlation nor there is informa- 
tion on the resultant diseased phenotype due to faulty protein interaction. Attempts 
on this line would surely add much-needed dimension toward the understanding of 
the role of faulty proteins in disease development. The majority of available public 
protein-protein databases are HPRD, IntAct, MINT, STRING, and BioGRID, and 
many others of these have their own unique feature with a large variation in archi- 
tectural design and annotation (Table 6.5). Such databases simplify the identifica- 
tion of biological networks and formulate hypothesis based on the protein functions 
and cellular mechanism taking into account the rapidly growing PPI data. 

Identification of protein 3D structure is important for many areas like drug design- 
ing and protein modeling. The 3D structure of a protein determines function and can 
infer the possible interacting partners; therefore, advanced techniques in drug devel- 
opment and analysis can make wide use of computer-aided program for protein 
structure visualization. Some of the software and/or servers used for visualizing the 
3D protein structure are enlisted in Table 6.6. The protein complexes formed by 
physical interaction between proteins are the main elements responsible for cellular 
functions within the cell. Thus, the identification of complex formation is necessary 
to understand the structural organization of the cell. Computational methods such as 
protein-protein docking (PPD) are required to study protein complexes at the struc- 
tural level. Such methods begin with some structural representation of each of the 
constituent proteins and attempt to produce an accurate 3D model of the complete 
complex. Thus, PPD methods give clear understanding for the (1) nature of interac- 
tions between interacting proteins, (2) three-dimensional conformation adopted by 
interacting proteins, and (3) atomic strength between the interacting proteins. 
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Table 6.5 Available databases USEFUL for PPI 
Database Description URL 
BIND Peer-reviewed biomolecular interaction database http://bind.ca/ 
containing published interactions and complexes 
BioGRID __| Protein and genetic interactions from major model http://www. 
organism species thebiogrid.org/ 
COGs Orthology data and phylogenetic profiles http://www.ncbi.nlm. 
nih.gov/COG 
DIP Experimentally determined interactions between proteins | http://dip.doe-mbi. 
ucla.edu/ 
HPRD Human protein functions, PPIs, posttranslational http://www.hprd.org/ 
modifications, enzyme-substrate relationships, and 
disease associations 
IntAct Interaction data abstracted from literature or from direct | http://www.ebi.ac.uk/ 
data depositions by expert curators intact/ 
iPfam Physical interactions between those Pfam domains that http://ipfam.sanger. 
have a representative structure in the Protein Data Bank _| ac.uk/ 
(PDB) 
MINT Experimentally verified PPI mined from the scientific http://mint.bio. 
literature by expert curators uniroma2.it/mint/ 
Predictome | Experimentally derived and computationally predicted http://visant.bu.edu/ 
functional linkages 
ProLinks The ProLinks database is a collection of inferences of http://prl.mbi.ucla. 
functional linkages between proteins using four methods. | edu/prlbeta/ 
These methods include the phylogenetic profile method 
which uses the presence and absence of proteins across 
multiple genomes to detect functional linkages; the gene 
cluster method, which uses genome proximity to predict 
functional linkage; Rosetta Stone, which uses a gene 
fusion event in a second organism to infer functional 
relatedness; and the gene neighbor method, which uses 
both gene proximity and phylogenetic distribution to 
infer linkage 
SCOPPI Domain—domain interactions and their interfaces derived _| http://www.scoppi. 
from PDB structure files and SCOP domain definitions org/ 
STRING Protein functional linkages from experimental data and http://string.embl.de/ 
computational predictions 
PINA Protein Interaction Network Analysis (PINA) platform is | http://cbg.garvan. 
an integrated platform for protein interaction network unsw.edu.au/pina/ 
construction, filtering, analysis, visualization, and 
management. It integrates protein-protein interaction 
data from six public curated databases (IntAct, 
BioGRID, MINT, DIP, HPRD, MIPS/MPact) and builds 
a complete, nonredundant protein interaction dataset for 
six model organisms. Moreover, it provides a variety of 
built-in tools to filter and analyze the network for gaining 
insight into the network 
MIPS/ MIPS Mammalian Protein-Protein Interaction Database _| http://mips. 
Mpact is a collection of manually curated high-quality PPI data | helmholtz-muenchen. 
collected from the scientific literature by expert curators | de/proj/ppi/ 
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Table 6.5 (continued) 
Database Description URL 
MiMI Michigan Molecular Interactions (MiMI) database http://mimi.ncibi.org/ 
comprehensively includes protein interaction information | MimiWeb/main- 
that has been integrated and merged from diverse protein | page.jsp 
interaction databases and other biological sources 
UniHI Unified Human Interactome (UniHI]) integrates human http://www.unihi.org/ 





protein-protein and transcriptional regulatory 
interactions from 15 distinct resources. The UniHI 
database includes tools (1) to search for molecular 
interaction partners of query genes or proteins in the 
integrated dataset; (2) to inspect the origin, evidence, and 
functional annotation of retrieved proteins and 
interactions; (3) to visualize and adjust the resulting 
interaction network; (4) to filter interactions based on 
method of derivation, evidence, and type of experiment 
as well as based on gene-expression data or gene lists; 
and (5) to analyze the functional composition of 
interaction networks 








Table 6.6 Tools used for PPIs and their networks 


























Tool Distinctive features Webpage 

BioLayout | Facilitates microarray data analysis http://www. biolayout.org/ 

express 3D 

Cytoscape _| Versatile; implements many visualization http://www.cytoscape.org/ 
algorithms; many plug-ins available 

Large graph | Especially useful for dynamic visualization of http://sourceforge.net/ 

layout large graphs, force-directed layout algorithm projects/Ig] 

Osprey Provides network, connectivity filters, many http://biodata.mshri.on.ca/ 
layouts, and facilitates dataset superimposing osprey/servlet/Index 

Pajek Especially useful for the analysis of very large http://vlado.fmf.uni-lj.si/ 
networks pub/networks/pajek/ 

VisANT Especially facilitates analysis of gene ontologies http://visant.bu.edu/ 

yED General purpose graph editor http://www.yworks.com/ 

products/yed/ 
Arena3D 3D view of the network http://arena3d.org 
MEDUSA __ | It was specially designed and optimized for accessing | http://coot.embl.de/medusa 








protein interaction data from STRING database 


Different algorithms/software for automatic protein-protein docking are available 
(Table 6.7). In a recent study, we assessed the PPI between the human testis-specific 
protein, Y-encoded (TSPY) and eukaryotic translation elongation factor 1 alpha 2 
(eEF1A2) docked complex and mapped their interacting interface (Fig. 6.3). PPD 
does not forecast which protein could interact, but it can predict how the proteins 
interact. Although docked complex with lowest interaction energy is considered as 
best solution given by PPD, other interaction energies are also involved in large sur- 
face displacements in docked conformation to finally form the protein complex. 
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Table 6.7 Details of protein-protein docking servers 
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Tools 
3D-Dock Suite 


Description 


Integrated approach to protein docking 
with FTDock, RPScore, and 
MultiDock 


Web link 


www.sbg.bio.ic.ac.uk/docking 






































3D-Garden System for modeling protein www.sbg.bio.ic.ac.uk/~3dgarden 
complexes based on conformational 
refinement of ensembles generated 
with the marching cubes algorithm 

Bielefeld protein It detects geometrical and chemical www.techfak. 

docking complementarities between surface of | unibielefeld/~posch/DOCKING/ 
proteins and estimates docking install.html 
positions 

BiGGER Protein docking algorithm integrated www.cqfb.fct.unl.py/bioin/ 
in Chimera, a molecular graphics and | chemera 
modeling program for studying 
protein structures and interactions 

ClusPro Integrated approach to protein www.cluspro.bu.edu/ 
docking with DOT and ZDOCK 
and PIPER 

DOT It computes the electrostatic potential | www.sdsc.edu/CCMS/dot/ 
energy between two given proteins or 
other charged molecules 

ZDOCK Performs a full rigid-body search of www.zdock.umassmed.edu/ 
docking orientations between two software 
proteins including performance 
optimization and a novel pairwise 
statistical energy potential 

PIPER FFT-based docking with pairwise www.structure.bu.edu/content/ 
potentials protein-protein-docking 

Eschler NG Enhanced version of the original www.ddl.unimi,it/escherng/ 
ESCHER protein-protein automatic index.htm 
docking system developed in 1997 

HADDOCK High-ambiguity-driven biomolecular www.nmr.chem.uu.nl/haddock 
docking that employs biochemical 
and/or biophysical interaction data 

Hex Protein docking and molecular www.hex.loria.fr 
superposition program 

RosettaDock Predicts the structure of protein www.rosettadock.graylab.jhu. 
complexes given the structures of the | edu 
individual components and an 
approximate binding orientation 

FireDock Fast interaction refinement in www.bioinfo3d.cs.tau.ac.il/ 
molecular docking is a web server for | FireDock/ 
flexible refinement and scoring of 
protein-protein docking solutions 

GRAMM-X It is a public web server for proteinx96_ | www.vakser.bioinformatics. 





protein docking 





ku.edu/resources/gramm/ 
grammx 
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eEFIA2 


TSPY 


Fig.6.3 Docked model of the TSPY—eEF1A2 complex and its structural mapping of intermolecu- 
lar interfaces. (a) Cartoon representation of TSPY—eEF1A2 docked complex with low energy 
score determined by HADDOCK docking. TSPY displayed in teal and eEF1A2 in red. (b) A close- 
up of interacting interface TSPY-eEF1A2 complex is represented in surface representation. 
Interacting residues from TSPY (yellow) and domain I and domain III of eEF1A2 are shown in 
blue and magenta, respectively. (c) Interacting residues between TSPY and eEF1A2 obtained 
through hydrogen bonding are shown in line representation, TSPY residues (yellow), and eEF1A2 
(domain I blue and domain III magenta) with the atomic distances (green dotted lines) with labeled 
residue name (Adapted from Panwar et al. 2015) 








6.7 Domain-Domain Interactions 


As protein-protein interactions mostly occur via domains instead of the whole pro- 
tein surface, identification of domain—domain interaction (DDD) is an imperative step 
toward PPI prediction. Protein domains are compact regions within the protein’s 
structure that possess a distinct function. On an average, observed protein domains 
have more than 200 residues, with an average domain size of about 100 residues. 

A protein can have the multiple or a single domain, each one characteristically 
assigned with a precise function (Teichmann 2002), and combination of these 
domains can define the function and interaction of protein (Ingolfsson and Yona 
2008). Protein domains from homologous might exist conservation in the interac- 
tion patterns across the domain superfamily as they preserve the same three- 
dimensional structures. Each domain also forms a three-dimensional structure that 
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is independently stable and folded. As 3D structure of proteins is critical for predic- 
tion of PPIs; the interacting interfaces (domains) must be folded into specific con- 
formations so that they interact with other proteins (physically and energetically) 
(Fig. 6.4). Therefore, identifying the PPI at domain level can lead to an important 
step for identification of PPI (Deng et al. 2002; Lee et al. 2006; Ng et al. 2003; 
Guimaraes et al. 2006; Ta and Holm 2009). 

Several methods have been used in prediction of domain interactions from PPI 
data graphs, but the first DDIs were identified based on 3D structures of protein 
complexes from Protein Data Bank. Databases such as iPfam, 3DID, and PInS 
extract DDIs from the interacting interfaces of known protein 3D structures. In 
Saccharomyces cerevisiae, Escherichia coli, Caenorhabditis elegans, Drosophila 
melanogaster, and Homo sapiens, DDIs covered are less than 20% of their PPIs in 
databases. To complement DDIs, various computational methods have been pro- 
posed to predict DDIs in recent years (Table 6.8). Although 3D structures are the 


Protein sequence (Primary structure) +-+-+++++++ ..APSTLVRTVTMA........... 


Secondary Structure 


Multidomain Protein 





Protein assembly (Docked complex) 





>" 
A N-terminal 


Protein-protein interaction network 





Fig.6.4 A workflow of protein structure organization. Three basic levels are assigned to protein: 
primary, secondary, and multi-domain (tertiary and quaternary structure). Then multi-domain pro- 
teins form structural networks by interacting with each other at the residue level forming a com- 
plex network 
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Table 6.8 Databases and tools for domain—domain interaction 
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Tools 


3D interacting domains 
(3DID) 


Description 


Search for domain—domain 
interactions in proteins for which 
high-resolution three-dimensional 
structures are known 


Web link 


www. 3did.irbbarcelona.org 








PFAM A semiautomated definition of www.pfam.sanger.ac.uk/ 
families (HMM based) 

SMART Smaller number of domain www.smart.embl-heidelberg.de/ 
families (manually curated) 

InterPro Perhaps the most comprehensive | www.ebi.ac.uk/interpro/ 


tool 





Database of Domain 
Interactions and 
Bindings (DDIB) 


DOMINE—a database 
of protein domain 


Search for documented 
information on biomolecule 
interaction, especially protein 
domain—domain interactions 


Database of known and predicted 
protein domain (domain—domain) 


www.ddib.org/ 


www.domine.utdallas.edu/ 














interactions interactions 

DOMINO—a database _| Search for annotated experiments | www.mint.bio.uniroma2.it/ 

of domainx96peptide describing interactions mediated | domino/ 

interactions by protein interaction domains 

CSDBase Cold Shock Domain database www.chemie.uni-marburg. 

de/~csdbase/ 

DIMA Domain interaction map: www.mips.gsf.de/genre/proj/ 
experimental and predicted dima2 
protein domain interactions 

DomIns Database of Domain Insertions: www.domins.org/ 


Domain insertions in known 
protein structures 





InterDom—a database 
of putative interacting 
protein domains for 
validating predicted 
protein interactions and 
complexes 


Find evidence for the detected 
protein interactions based on 
putative protein domain 
interactions 


www.InterDom.lit.org.sg 





PDZBase—protein— 
protein interactions 
involving PDZ domains 
PepCyber:P~PEP—a 
database of human 
protein-protein 
interactions mediated by 
phosphoprotein-binding 
domains 


Search for information on 
protein-protein interaction 
involving PDZ domains 


Database specialized in 
documenting human PPBD- 
containing proteins and 
PPBD-mediated interactions 


www.icb.med.cornell.edu/ 
services/pdz/start 


www.pepcyber.org/PPEP/ 





PROCOGNATE—a 
cognate ligand domain 
mapping for enzymes 


Database of cognate ligands for 
the domains of enzyme structures 
in CATH, SCOP, and Pfam 


www.ebi.ac.uk/thornton-srv/ 
databases/procognate/ 





The Homeodomain 
Resource 





Search for curated information 
for the homeodomain protein 
family 





www.genome.nhgri.nih.gov/ 
homeodomain 
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base of predicting DDIs, many other computational methods like Pfam and CDD 
(Conserved Domain Database) begin annotating the domains from primary 
sequences. These databases share the protein domain annotation data, as each 
database has the unique annotations format. Some databases, e.g., InterPro (Hunter 
et al. 2009) and CDD (Marchler-Bauer et al. 2011), provide protein domain anno- 
tation information collected from several databases. Thus, domains are considered 
the fundamental units of protein structure, folding, function, and evolution. 

















Conclusions 

The genomics and proteomics approaches coupled with bioinformatics tools have 
a symbiotic relationship; new experimental methods require newly adapted tools 
besides well-established techniques. The bioinformatics tools enable high- 
throughput production of experimental results with quality control, transformation 
of these results into protein networks, and their exploitation through visualization 
and analysis tools. Hence, bioinformatics-based approaches serve as an additional 
tool for data mining, as well as for gene identification and disease prediction. 
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Genomics of the Human Y Chromosome: 7 
Applications and Implications 


Sanjay Premi, Jyoti Srivastava, and Sher Ali 


7.1 Introduction 


Mammalian sex chromosomes are often uniquely different from each other in terms 
of their structural organization and genes related to sex determination and differen- 
tiation. For instance, ~78 genes identified on the male-specific region of human Y 
chromosome (MSY) express mostly in testis and code for ~27 distinct proteins. 
According to a largely believed hypothesis, human Y chromosome lost most of the 
genes during its evolution except the ones essential for male sex determination. This 
evolutionary degeneration of the Y chromosome is commonly linked to its inability 
to undergo homologous recombination with the X chromosome or any of the auto- 
somes. Due to its “gene-poor” landscape and continuously decreasing size, Y chro- 
mosome was hypothesized to disappear in ~10 million years. However, abundant 
literature from modern day research provides evidence on its continual sustenance. 
First is the MSY which is a large portion of the Y chromosome, and owing to which 
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Y does not participate in X-Y recombination. Any further reduction in its size would 
be a serious threat to human existence. Secondly, MSY is a result of segmental 
duplications (Hurles and Jobling 2003). These duplications lead to gene conver- 
sions and thus protect the human Y chromosome. Also, the Y chromosome is domi- 
nant as is witnessed by a male phenotype in patients with multiple X but only a 
single Y chromosome. Moreover, the highly palindromic and repetitive landscape of 
the Y chromosome leads to enhanced mutation rate which fuels higher levels of 
polymorphisms (Jobling et al. 2007). The MSY was described in detail by Jobling 
and Smith in 2003 (Fig. 7.1). Surprisingly, several Y chromosome haplotypes main- 
tain fertility even without essential Y-linked genes. This highlights two facts: 
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Fig. 7.1 Representative geography of the human Y chromosome. Locations of various loci, 
genes, and cytogenetic features are shown from /eft to right with numbering originating from short 
arm telomere. IR is inverted repeats and P for palindromes. The 27 known Y-linked genes and their 
estimated expression are also shown. Some of the well-studied clinical phenotypes associated with 
Y chromosome are designated on the right. This figure is taken from Jobling et al. (2003) Nature 
Reviews 4, 598-612 
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repetitive landscape of the Y acts as a buffer for loss of its genes and that Y chromo- 
some to autosomal interactions might be essential for sustenance of male fertility. 
This chapter describes organizational complexities of the Y chromosome under 
various normal and disease phenotypes and effect of exogenous/environmental fac- 
tors in augmenting these complexities. Due to its unique structural organization and 
nonhomologous nature, the Y chromosome is a unique tool for DNA-based diagno- 
sis under normal and abnormal conditions related to male fertility. 





7.2 Y Chromosome Evolution 


Y chromosome is humans and the equivalent W chromosome in several other ani- 
mals evolved independently. However, both of these sex chromosomes are gene 
poor, heterogametic, and often smaller than the autosomes (Hurst and Randerson 
1999). Evolutionarily, four different processes have been hypothesized for the emer- 
gence of Y and W. First is asexual degeneration, which is defined as degeneration 
of a particular chromosome owing loss of its ability to recombine. This leads to loss 
of genes/DNA from the male sex chromosomes Y and W (Charlesworth and 
Charlesworth 2000; Steinemann 2000). Second is sexual antagonism, which is aug- 
mented fitness of a single or pool of genes on particular chromosomes. This mascu- 
linized the Y chromosome and feminized the X chromosomes (Brooks 2000; 
Charlesworth et al. 1987; Gavrilets et al. 2001; Wolfenbarger and Wilkinson 2001). 
Third is constant selection, which is sustained inheritance of the Y chromosome in 
human males leading to uniparental inheritance (Ting 1998; Wu et al. 2000). And 
the last hypothesis is hemizygous exposure, which is functional promotion of a par- 
ticular chromosome (Y in humans) by fixing recessive mutations on the partner 
X chromosome. This process predicts the masculinization of the X chromosome. 
This is seen largely in reptiles where only Y or W chromosomes do not decide the 
sex, but the sex determination relies on temperature. This maintains the flexibility 
and sex ratios. 

Since the Y and X chromosomes have distinctly different sizes, the Y chromo- 
some is suggested to be a degenerated autosome, whereas the X behaves just like the 
autosomes. The pseudoautosomal region (PAR), a small region of Y homology with 
X chromosomes, consists of X-transposed and X-degenerate sequences (Skaletsky 
et al. 2003). Thus the X and Y probably originated from an autosomal chromosome 
pair. It is hypothesized that degeneration and differentiation of the Y started when it 
is accumulated by sex determination genes (Waters et al. 2001, 2005). Some evolu- 
tionary traces of the human Y chromosomes can still be seen in placental mammals, 
marsupials, and monotremes (Delbridge et al. 1997; Glas 1999; Pask et al. 2000). 
Although the expressional profile of Y-linked genes is still being explored, the 
human Y chromosome is considered as the most evolved sex chromosome. Further 
detailed investigations are envisaged to locate expressional variability of the 
Y-linked genes under abnormal conditions. Such information is critical for diagno- 
sis, prognosis, comparative genomics, and probably ascertaining the causes of male 
infertility in addition to an important tool in better understanding of the 
Y chromosome. 
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7.3 Y-Linked Genes 


Compared to functionally assorted genes or gene families on autosomes, Y-linked 
genes have distinct expressional profiles (Jobling and Tyler-Smith 2003). The non- 
recombining region on Y (NRY) or male-specific region (MSY) contains three dis- 
tinct classes of genes. Class 1 includes “X-degenerate” single copy genes expressed 
ubiquitously having X-linked homologues with similar functions. Class 2 includes 
“X-transposed” genes supposed to have transposed to Y from X chromosome. These 
multi-copy genes express specifically in testis. Class 3 includes single copy Y-linked 
genes such as SRY which expresses specifically in embryonic bipotential gonad and 
adult testis (Harley et al. 2003). Others genes belonging to this category are AMELY 
and PCDHY which, with their active X-linked homologues, are expressed in devel- 
oping tooth buds and brain, respectively (Crow 2000; Nachman 2001; Oota et al. 
2001). Recently, another ampliconic region of MSY having more than 99.9% 
homology among them has been reported (Skaletsky et al. 2003). The ampliconic 
sequences harbor highest density of coding and noncoding Y-linked genes with vari- 
able copy numbers. 

In addition to the MSY, several Y chromosomal regions do not abide by general 
rules of genetics. For example, most genes on the PAR maintain dosage compensa- 
tion by eluding X-inactivation, whereas synaptobrevin like 1 (SYBL/) and sprouty 
homologue 3 (HSPRY3) on the long arm PAR, respectively, undergo X and Y inac- 
tivation in females and males. This epitomizes a complex evolutionary trail or gene 
translocations across X and Y (Jobling and Tyler-Smith 2003). Heterochromatin on 
Y chromosome is not supposed to transcribe; however, several chimeric and testis- 
specific transcripts have been reported recently from the long arm Y heterochroma- 
tin (Jehan et al. 2007). Genes on Y chromosome are envisaged to originate on 
several autosomes. One such example is the RNA-binding motif (RBMY) which is a 
spermatogenesis gene from the proto-XY pair. Nearly identical homologue of 
RBMY is called HNRPG or RBMX which is present on the human X chromosome 
(Delbridge et al. 1999). RBM is also present on X and Y chromosomes in rodent 
genome (Mazeyrat et al. 1999). Thus, it can be construed that the RBM/RBMY 
evolved from very old and common ancestors like proto-XY pair. 

Two genes ZFY and DFFRY (USP9Y) are the examples added recently to the Y 
chromosomes. Both of these genes have copies on the X chromosome which prob- 
ably evolved into Y homologues. But the marsupial ZFY and USP9Y are autosomal 
suggesting their recent addition to the eutherian X and Y chromosomes. The ZFY 
and ZFX express ubiquitously in human whereas ZFY is testis specific in mouse 
(Koopman et al. 1989). Similarly, expression of DF FRY is ubiquitous in humans 
and testis specific in mouse. DFFRY mutations are commonly detected in infertile 
human males (Saxena et al. 1996, 2000). Further, this gene seems to affect ovarian 
functions in Drosophila. It can be hypothesized that this gene acquired gonadal 
functions and then translocated to the Y chromosome. 

Another example is four copies of the DAZ genes which code for RNA-binding 
motifs but lack an X chromosome homologue. Autosomal homologue DAZLA has 
been reported in mice and marsupials but not the DAZ (Delbridge et al. 1997). These 
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DAZ genes probably originated from transposition (Saxena et al. 1996, 2000). 
Another example is multicity human CDY gene which is completely absent in 
mouse genome. Instead, the autosomal CDYL performs both ubiquitous and testis- 
specific functions in mouse. Human CDYL contains introns, but CDY on the Y is 
intronless indicating that the latter must have translocated from chromosome 6 to Y 
through retroposition (Dorus 2003; Lahn 1997; Mazeyrat et al. 1999). 

Interestingly, the most celebrated sex determination gene, SRY, also has a homo- 
logue SOX3 on the X chromosome. The SOX3 demonstrates dose-based sex differ- 
entiation signals in marsupials (Pask et al. 2000). In higher mammals like mouse 
and humans, SOX3 specifically expresses in central nervous system (CNS) and 
genital ridge (Shen and Ingraham 2002). Sequence comparisons suggest SRY as a 
surviving relic of SOX3 and possibly arose from a gene on the proto-sex chromo- 
somes. It is likely that SRY and SOX3 have different modes of actions in different 
species and possible brain functions in both sexes (Graves 2002). 





7.4 Y Chromosome and Male Sex Determination 


Sex determination and male (in)fertility is strongly linked to Y chromosome. 
Mouse models with two X chromosomes but transgenic for SRY develop male 
gonads suggesting SRY to be the testis-determining factor (TDF). Also, mutations 
in the SRY gene frequently lead to dysgenesis of male gonads in XY females which 
further supports its essential role in male sex determination (Harley et al. 2003). 
These mutations are often missense in the high-mobility group (HMG) domain of 
the SRY protein (Harley et al. 2003). Functionally, SRY is a transcription factor 
which binds to the DNA in a typical sequence-specific manner, a function which if 
impaired leads to clinical XY gonadal dysgenesis (Harley et al. 1992; Nasrin et al. 
1991). Of several mutations often seen in the SRY, the ones located in the HMG 
box specifically disrupt its DNA-binding capability (Harley et al. 1992; Mitchell 
and Harley 2002; Pontiggia et al. 1995; Schmitt-Ney et al. 1995; Tiepolo and 
Zuffardi 1976). 

In addition to the SRY, genes located in the azoospermia factor (AZF) region on 
Y chromosome are also crucial for maintaining spermatogenesis (Vogt 1996). AZF 
comprises of three nonoverlapping subregions arranged from proximal to distal 
long arm euchromatin of the Y chromosome. These subregions are called AZFa, 
AZFb, and AZFc (Vogt 1996), and each has its own candidate genes. Ubiquitin- 
specific protease 9, Y chromosome (USP9Y) gene, covers almost half of the AZFa 
sequences (Brown et al. 1998; Mazeyrat et al. 1999; Vogt 1997). Loss of USP9Y 
from AZFa is generally associated with spermatogenic failures (Ferlin 1999; Foresta 
et al. 2000; Vogt 1997). Additionally, dead box on Y chromosome (DBY) and ubiq- 
uitous TPR motif on the Y (UTY) are two more candidate genes on the AZFa region 
(Lahn 1997; Mazeyrat et al. 1999). It has been suggested that USP9Y deficiency 
leads to altered spermatogenesis and this oligospermic phenotype is worsened with 
additional loss of DBY (Brown et al. 1998; Sun et al. 2000; Foresta et al. 2000). 
Though DBY is frequently deleted in infertile males, it is expressed ubiquitously. 
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Hence, the specific functional role of DBY in spermatogenesis and male germ cell 
development is still not completely explored. 

Currently, translation initiation factor [A-isoform Y (E/JFIAY), RNA-binding 
motif on Y (RBMY), and heat shock transcription factor on Y (H'SFY) are the only 
three genes attributed to the AZFb region (Skaletsky et al. 2003; Tessari 2004). 
EIFIAY is expressed ubiquitously with one transcript specifically detected in testis. 
EIF AY codes for a translation initiation factor, eIF-1A, and the biological function 
of eIF-1A still remains to be determined. RBMY belongs to a family of 20-50 genes 
and pseudogenes which are spread over both the arms of Y chromosome with one 
locus in AZFb (Prosser et al. 1996). RBMY is specifically expressed in male germ- 
line cells substantiating its role in spermatogenesis. Protein coded by HSFY con- 
tains an HSF type of DNA-binding domain, related to HSF2 gene on chromosome 6. 
This gene is active during spermatogenesis and embryogenesis only (Pirkkala 
2001). Overall, the large Y chromosomal deletions detected in infertile males often 
have their breakpoints in the AZFb region (Ferlin 2003). 

AZFc region is located in distal Yq11, and its deletion is the most frequent cause 
of azoospermia or severe oligozoospermia and hence male infertility (Krausz 1999). 
AZFc is composed of large blocks of repeat sequences leading to the ampliconic 
landscape (Kuroda-Kawaguchi et al. 2001). These amplicons are arranged into pal- 
indromes which are responsible for duplications, deletions, and gene conversions 
through intrachromosomal recombination. The large palindromes with high intra- 
chromosomal sequence homology also ensures structural integrity. Currently, eight 
spermatogenesis genes have been attributed to the AZFc which are BPY, CDY1/, 
CSPG4LY, DAZ, GOLGA2LY, TTY3, TTY4, and TTY17 (Kuroda-Kawaguchi et al. 
2001). Distinct proteins have only been detected for three of these genes. The 
deleted in azoospermia (DAZ) gene consists of nearly identical four copies. Each of 
these copies codes for DAZ repeats on C-terminus and an RNA-binding domain at 
the N-terminus (Mahadevaiah 1998; Reijo et al. 1995). Owing to its recurrent dele- 
tion in human males with spermatogenic failure, DAZ is considered to be one of the 
potential spermatogenesis genes located on AZFc (Lahn 1997; Wong et al. 1999; 
Yen 1998). Functional significance of other AZFc gene remains unknown except 
that most of them have multiple copies with testis-specific transcription. Based 
upon its location in the DAZ locus, one of the two CDY genes is also presumed to be 
essential for spermatogenesis (Yen 1998). Individual functional analyses of the 
CDY and DAZ genes will uncover the actual relative significance of these two genes 
in spermatogenesis. 

In addition to specific genes, Y chromosomal microdeletions are also envis- 
aged to cause spermatogenic failure and thus azoospermia and oligospermia. 
Sequence-tagged sites (STSs) are generally used to map these microdeletions 
and localize them to specific AZF regions during standard diagnosis regime 
(Vollrath et al. 1992). One key contradiction is detection of ~0.4% of all known 
microdeletions in normal males which are postulated to be part of natural poly- 
morphism (Kent-First et al. 1999; Kobayashi et al. 1995; Pryor et al. 1997). 
There is a prevalence of microdeletions (up to 35%) in infertile males which may 
be due to pressure from natural selection or an actual diagnostic/prognostic 
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marker. Thus, the clinical relevance of microdeletions is still debatable. 
Nevertheless, microdeletions are frequently concentrated in AZFc region (up to 
59.6%) including DAZ loci, AZFb region (up to 15.8%) including RBMY loci, 
and AZFa region (up to 4.9%). Microdeletions in the AZFc region are strongly 
linked with severe male infertility phenotypes like azoospermia, SCOS (Sertoli 
cell-only syndrome), and arrest of the spermatogenesis process. AZFc microde- 
letions are associated with azoospermia, severe oligozoospermia, and testicular 
pathologies like Sertoli cell-only syndrome (SCOS) to spermatogenic arrest and 
hypospermatogenesis. On the other hand, AZFa deletion mostly leads to SCOS 
and complete arrest of spermatogenesis which is often traced back to mutations 
and deletions of the USP9Y or DBY genes (Ferlin 1999, 2003; Foresta et al. 2000; 
Lin et al. 2005). 

Intrachromosomal homologous recombination (ICHR) is prevalent on human 
Y chromosome due to palindromic repeats causing major interstitial deletions 
(Repping et al. 2002; Saxena et al. 2000; Sun et al. 2000). AZFc is more prone to 
ICHR followed by the AZFa region. Moreover, human endogenous retrovirus 
(HERV15) sequences in AZFa are known to recombine intrachromosomal and 
cause major deletions or duplications responsible for the azoospermia phenotype 
(Bosch 2003; Sun et al. 2000). The additional 100-200 bp repeat spread all over 
the AZFa and the HERV sequences further increase the instances of ICHR lead- 
ing to minor and major deletion events. Similar to AZFa, the AZFc contains six 
major amplicons (115 kb gray to ~700 kb yellow) which are nearly identical 
(Skaletsky et al. 2003). These palindromes are major targets for ICHR. For 
instance, recombination between palindromes P5 and P1 leads to a massive dele- 
tion extending from the AZFb region to ~1.5 Mb into AZFc removing ~6 Mb 
total (Repping et al. 2002). Another recombination event between P5 and P1 
removes ~7.7 Mb, which is one of the largest ICHR-mediated deletions known. 
ICHR mediated-deletions and structural reorganization in AZFc region are also 
known to be implicated in spermatogenic failure too (Vogt 2004; Repping et al. 
2003). In conclusion, ICHR-mediated chromosomal alterations and microdele- 
tions are largely common in patients with male infertility (Marshall Graves 2000; 
Premi et al. 2008). 





7.5 Polymorphic Nature of the Y Chromosome 


Segmental duplications and copy number variations (CNVs) are expensively 
described in human genome (Bailey 2002; Sebat 2004). Owing to its repetitive 
nature, the human Y chromosome endure extensive deletions, inversions, and neu- 
tral translocations (Bernstein et al. 1986; Schmid et al. 1984). The repetitive nature 
also contributes to Y polymorphism among different ethnic groups. However, due to 
lack of concrete data, the haplotypic polymorphism cannot yet be completely 
explained by palindrome-based ICHR. 

Prenatal sex determination, archaeological analysis, forensic typing, and pater- 
nity testing are based on a routine assay for the amelogenin gene AMELY on Yp. 
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However, inverted repeat 3 (IR3)-based inversion has been reported to cause an 
inversion on Yp leading to two different orientations of the AMELY (Skaletsky et al. 
2003). Similarly, ICHR between various copies of the testis-specific protein on Y 
(TSPY) gene is known to delete this gene from the Yp region in human males (Santos 
et al. 1998). Evolutionarily, TSPY deletions have been reported at least seven times 
independently. Moreover, some deletion events mediated by ICHR, often leading to 
deletion of PRKY and TBLIY, are neutral and do not cause any clinical phenotypes 
(Jobling et al. 2007). 

Palindromic repeats and their structural organization on the AZFc region are 
extremely prone to inversions and conversions which have been demonstrated by 
mapping sequence-tagged sites (STSs) and single-nucleotide variations (SNVs). 
Investigations on Y chromosome haplotypes coupled with STS mapping have 
demonstrated complete lack of DAZ1/DAZ2 and BPY2.2/BPY2.3 duplets in the 
haplogroup N (Fernandes et al. 2004). This is construed to be normal genetic vari- 
ation of the human genome and not linked with male infertility. Moreover, sub- 
stantial Y chromosomal variations are prevalent among individuals across the Y 
haplogroups or even the same haplotypes. In addition to megabase deletions and 
duplications, the variations also include specific gene structures, sizes, and dupli- 
cations/deletions. Four DAZ genes in a single locus vary in sizes with DAZ1, 65 kb; 
DAZ2, 70 kb; DAZ3, 50 kb; and DAZ4, 55 kb. The central genomic fragment sepa- 
rating two DAZ duplets at each DAZ locus is composed of tandemly repeated 
~2.4 kb repeat units. Further, each DAZ gene harbors variable numbers and 
sequences of exon 7 and a 10.8 kb unit. Additionally, this 10.8 kb unit has three 
copies in DAZ1 and two copies in DAZ4 (Saxena et al. 2000). DAZ genes have an 
additional polymorphism in terms of copies of RNA-binding motif (RBM). 
Furthermore, DAZ genes carry variable numbers of RNA-binding motifs (RBM) 
(Premi et al. 2010). 

Our own investigations have uncovered massive polymorphisms in Y chro- 
mosomal landscape which are either endogenous in normal males and males 
with sex chromosome-related anomalies or induced by exogenous factors like 
radiation exposure. We uncovered a unique structural reorganization in clini- 
cally normal males which did not agree with the structure reported earlier from 
full-length sequencing of human Y chromosome (Repping et al. 2002, 2003). 
Detailed investigation on copy number variations and STSs from ~1000 normal 
males revealed a singular structural organization (Premi et al. 2010). This inves- 
tigation also identified a surprising translocation of inter-DAZ genomic frag- 
ments from Yq to Yp in Indian and European Y chromosome. The European Y 
chromosome was used as a control because the first whole length Y chromo- 
somal sequencing was performed on DNA from a Caucasian male. This translo- 
cation still remains a functional and physiological mystery, more so when the 
promoters for all four DAZ genes are located in the translocated inter-DAZ 
region. Localization of this inter-DAZ segment on short arm instead of long arm 
of the Y chromosome implicates an unexplored promoter region either within or 
outside the DAZ locus on the long arm. This also suggests that in spite of harbor- 
ing promoters for crucial spermatogenesis genes, the inter-DAZ region is just 
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Fig. 7.2 Locations of various fluorescence in situ hybridization (FISH) probes that we used in our 
investigations on the polymorphic nature of human Y chromosome. Red circles indicate probes for 
that particular region which may encompass a single gene, palindrome, intergenic regions, or mul- 
tiple genes. (a) Y chromosome cartoon with AZFc region as vertical rectangular bars. (b) 
Hypothetical expansion of the AZFc region to demonstrate locations of various FISH probes. (c) 
Inverted repeat 3 on long arm of the Y chromosome 


another palindrome which translocated from Yq to Yp through ICHR. This 
translocation is explained in Figs. 7.2 and 7.3. 

Further, our own studies suggest that the inter-DAZ region is just a small part of 
ICHR-mediated, intrachromosomal, and interchromosomal translocations. We 
localized the Y-specific AZFc amplicons P/.1//J.2 on proximal and distal regions of 
chromosome 15 in addition to the Y chromosome. This polymorphism was a further 
substantiated localization of P/.1/1.2 genes TTY3 and XKRY onto the proximal 5p 
region, but not on the Y chromosome. We hypothesized that these Y chromosomal 
sequences might not be essential for sex determination or spermatogenesis, and 
hence genome tolerates their translocation onto the autosomes. However, the 
Caucasian Y map from the database still puts these regions onto the Y chromosome. 
However, such variations reiterate the fact that the Y chromosome acquired building 
blocks from various autosomes, or Y itself is a degraded version of an autosome. We 
also hypothesized that the DAZ genes underwent several rearrangement events lead- 
ing to the current AZFc structural configuration where segments of the AZFc occupy 
both the Yp and Yq locations. Evolutionarily, the DAZ and CDY genes, both from 
the AZFc region, are present on short arm of the Y chromosome in pygmy chimpan- 
zee and the Sumatran orangutan. Also, the DAZ genes have an autosomal 
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Fig. 7.3. Expected and observed structure of the Y chromosome palindromes and amplicon in 
750 Indian males. The human Y chromosome map published in 2003 was taken as reference 
(http://www.nature.com/nature/focus/ychromosome/). (a) AZFc region, specifically the DAZ 
genes and inter-DAZ region. (b) TTY3 and CDY/ gene regions. Taken from Premi et al. (2010) 
Chromosome Research 18: 419-430 


homologue called DAZLA. In conclusion, the AZFc sequences including important 
genes originated on the autosomes, translocated onto the Yp region and finally to 
the Yq region. Both CDY and DAZ are proposed to be present in highly repetitive 
sequence clusters on Y chromosome. AZFc region was never investigated empiri- 
cally before. Our investigations of the MSY uncovered a unique organizational 
variation in terms of DAZ loci and megabase regions of the AZFc itself. This orga- 
nization has never been reported, and it adds to the preexisting variations and poly- 
morphisms of the Y chromosome. Analysis of large cohorts of Y chromosomes 
from different geographical regions all over the world is warranted to completely 
understand the complexity of its structural organization. 

In spite of evolutionary degradation, Y chromosome has now adapted to endog- 
enous insults and exogenous environmental factors; otherwise human species will 
cease to exist. For instance, exposure to high levels of natural radioactivity induces 
severe sequence-based and structural polymorphism on the Y chromosome. 
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Prolonged experimental irradiation and genome analyses is not ethically feasible for 
human beings. We exploited a natural setting in South Indian state of Kerala where 
the level of natural radioactivity is very high due to 10% thorium phosphate mona- 
zite present in the beach sand (Pask et al. 2000). Our own study revealed that Y 
chromosome use gene duplications, sequence polymorphisms, and structural altera- 
tions to buffer genotoxic effects of natural radioactivity (Premi et al. 2009). 

As described above, absence of a homologue, repetitive landscape, and ICHR 
make human Y chromosome highly vulnerable to polymorphisms and sequence 
alterations (Crow 2000; Jehan et al. 2007). Since Y does not undergo homologous 
recombination, such polymorphisms and variations are passed onto the next genera- 
tion through paternal inheritance. Since Y chromosome decides the sex, the lethal 
and clinical mutations or chromosomal anomalies are never inherited. Our analyses 
of the human Y chromosome from radiation-exposed males also uncovered that 
most of the microdeletions, sequence polymorphisms, and gene duplications did not 
follow a normal inheritance pattern. However, the radiation exposure did not affect 
the germline DNA (from sperms). This leads to the hypothesis that owing to its 
proneness, Y chromosome absorbs radiation genotoxicity in the form of somatic 
alterations, whereas germline remains unaffected so that the continuation of the spe- 
cies is warranted. An alternate expiation however is that meiotic germ cells have a 
high turnover and this removes any sick cells before they can transform into active 
spermatozoa. Nevertheless, a detailed analysis of various cellular pathways respon- 
sible for radiation genotoxicity is envisaged to uncover a full extent of damage 
induced by radiation exposure and its impact on Y chromosome physiology. Some 
of these pathways are tumor suppression, apoptosis, genome imprinting, epigenetic 
modification, and controls of signal transduction. 

The radiation exposure also enhanced the polymorphism by inducing gene dupli- 
cations. Sequence analyses revealed that of all the copies, sequence of at least one 
remains unaffected. This suggests the buffering effect of Y chromosome explained 
above so that radiation genotoxicity can be neutralized. Some examples of such 
polymorphic multi-copy genes are SRY and CDY1. In addition to gene duplication, 
radiation exposure also enhanced the expression of Y-linked genes in white blood 
cells. Based upon this, it is logical to conclude that radiation exposure modifies the 
transcriptional machinery so that it either does not differentiate between polymor- 
phic copies or enhances the expression of the only normal gene copy. Both scenar- 
ios ensure sufficient supply of the transcripts from affected genes so that a normal 
function can still be maintained. Thus, exogenous factors like radiation exposure 
enhance the preexisting structural and sequence polymorphism on Y chromosome. 
However, the palindromes and segmental repeats protect the Y chromosomes by 
buffering and neutralizing the genotoxicity. 

A complete map of all possible Y polymorphism is difficult to be generated. 
However, an overview of its structural integrity across various haplotypes is abso- 
lutely essential before drawing any clinical conclusions. Preliminary attempts have 
been made along these lines. In 2006, Jobling et al. analyzed Y chromosome from 
each of the 47 branches of human genealogy (Jobling et al. 2006). This analyses 
revealed an expected variation in the length of Yq heterochromatin and megabase 
rearrangements of the AZFc region. 
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7.6  Disintegrating Y Chromosome 


Lack of a homologous chromosome is one of the major reasons for evolutionary 
degradation of the Y chromosome. It is speculated that genetic modifications lead to 
a steady decrease in recombinational frequencies and ultimately to cessation of 
homologous recombination (Brooks 2000). In addition, it is also believed that Y 
chromosome acquired palindromic repeats which induced inversions and similar 
megabase rearrangements which then inhibited any homologous recombination 
between X and Y (Premi et al. 2009). Loss of recombination caused Y degradation 
because there is no selection against deleterious mutations. Owing to these facts, it 
is widely believed that Y chromosome may disappear completely in about 5—10 mil- 
lion years (Lin et al. 2005). However, this hypothesis has been countered by several 
previous reports and our own investigations. 

First objection to the proposed ongoing degradation of Y chromosome is the 
gene copy number. Modern methods like genome-wide arrays and genome hybrid- 
izations have revealed a large quantity of copy number polymorphisms (CNPs) 
which were either unknown or unexpected (Bailey 2002; Sebat 2004). CNPs are 
frequently found in the genic regions, especially the ones with segmental duplica- 
tions. Many of which are implicated with resistance or susceptibility to a disease, 
responsiveness for a particular drug, or age-related ailments (Charlesworth and 
Hartl 1978). The SRY is long known to be a single copy gene. However, in our own 
investigation, we established multiple polymorphic copies of this gene in response 
to the radiation exposure and also in males with sex chromosome-related anomalies 
(Premi 2006; Premi et al. 2007, 2008, 2009, 2010). Some Turner’s syndrome 
patients had as many as 16 copies of the SRY which suggested multiple rounds of 
duplication assuming that originally there was only a single copy. Absence of gene 
duplication in their parents meant de novo events of gene duplication. 

These patients did not inherit 16 copies from the fathers, but instead acquired 
them by de novo duplications. To highlight the role of polymorphism in ongoing 
existence of the Y chromosome, a summary of gene duplications and sequence vari- 
ations from ~250 patients with sex chromosome-related anomalies is given in 
Table 7.1. In addition to microdeletions, the gene duplications were also 


Table 7.1 Summary of gene duplication in 236 patients shown in % 
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Figures in parenthesis represent number of patients analyzed in a given category 
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inconsistent between males of two generations. Once again, the germline was nor- 
mal in both generations (Premi et al. 2007, 2008) reestablishing the buffering effect 
of palindromic repeats. Moreover, higher frequency of CNP of Y-linked genes com- 
pared to that of autosomal CDYL and CDYL2 supported the alteration-prone nature 
of the Y chromosome. Thus, by virtue of being haploid and accessible to segmental 
duplication, Y chromosome seems to be affected more than any of the autosomes or 
X chromosome. The copy number polymorphism of Y-linked genes and palin- 
dromic regions is surprisingly high. However, this explains a continual genetic 
integrity and sustenance of the Y chromosome by tolerating endogenous and exog- 
enous genetic pressures (Bailey 2002; Premi et al. 2008; Sebat 2004). Multiplicity 
of the Y-linked sequences is probably due to nondisjunction which is supposed to 
maintain symmetrical copy numbers. Our investigations did not uncover this sym- 
metry between numbers of Y chromosomes in patients and corresponding copies of 
the Y-linked genes. This indicates a non-correlative relation between nondisjunction 
of the Y chromosome and copy number polymorphism, especially in the case of the 
Y chromosome. Such copy number variations lead to enhanced genomic complex- 
ity. This is particularly true for Y chromosome since all the duplicate copies do not 
follow corresponding expressional and mutational profile, within and between two 
generations (Marshall Graves 2000; Premi 2006). The copy number changes were 
never associated with translocations suggesting a tandem mode of gene duplication 
(Marshall Graves 2000; Premi et al. 2008). 

Classically, gene duplication leads to a normal parental copy and one or more cop- 
ies with modified or new functional properties. Another proposed model for copy 
number change suggests that duplications are either followed by functional diver- 
gence or by functional complementation of the original copy (Premi 2006; Premi 
et al. 2007). In our own investigations, we concluded that exogenous stresses like 
radiation exposure initiate the duplication events which are followed by multiallelic 
tandem duplications to neutralize radiation genotoxicity. Ultimately, such duplication 
events are envisaged to prevent the degradation of the Y chromosome (Table 7.2). 


Table 7.2 Fate of Y chromosome linked genes/loci in patients with SCRA and males exposed 
to NBR 





Changes detected | NBR exposed | SCRA patients 
Microdeletions High frequency in AZFc and | Almost same frequency in composite 
AZFa than AZFb | AZF region 


Gene duplications | One to two rounds of duplication / One to three rounds of duplication 
Multiple polymorphic copies of | Multiple polymorphic copies of the 
the SRY, DAZ, CDY1, and XKRY | SRY, DAZ, CDY1, and XKRY genes 





= : | genes — 

ICHR | No recombination | No recombination 

Genic deletions _| No deletion except few males | Frequently lack XKRY. , VCY, CDYI, 
lacking DBY in blood DNA | CDY2, GOLGA2LY, TTY4, and BPY2 


| | genes in random combinations 
ICHR Intrachromosomal homologous recombination, SCRA Sex chromosome related-anomalies, 
NBR Natural background radiation 
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Fig. 7.4 Overall conclusion of evolution, degradation, and sustenance of human Y chromosome. It 
started with proto-XY/ZW chromosome which underwent various genetic/physical transformations 
to form modern day Y chromosome. These transformations include asexual degeneration, sexual 
antagonism, hemizygous exposure, and constant selection, all of which hugely reduced the physical 
size of Y chromosome. But along with these, the Y chromosome has been constantly supplemented 
by gene transfers from autosomes and its internal polymorphisms like copy number variations, 
palindromic and ampliconic duplications, gene conversions, etc. which helped it to sustain 


Comparative sequence analyses between human and chimpanzee Y chromo- 
some suggest that the Y has not undergone any degradation for the last ~6 million 
years (Hughes et al. 2005). This comparison also revealed that selective purifica- 
tion of sex determining genes is active on the Y chromosome. Intrachromosomal 
recombination further aids in maintaining the integrity of Y chromosome through 
gene conversions, duplications, deletions, inversions, purifying selection, and the 
hitchhiking of the sex determination factors. The role of such recombinational 
events in maintaining structural integrity of the Y chromosome (Sharp et al. 2005) 
is modeled in (Fig. 7.4). Thus, the proposed ongoing degradation and ultimate 
demise of the human Y chromosome is not supported by empirical data and may 
not be true at all. 


Conclusions 

Owing to its well-documented role in spermatogenesis, significance of the Y chro- 
mosome in sex determination has been strongly substantiated. However, exact num- 
ber of Y-linked genes/loci and mechanisms controlling these phenomena are still 
obscure. Similarly, numbers of autosomal genes/loci implicated with testicular 
functions also remain a moot point. Thus, to be able to generate a consensus, current 
scenario warrants a comprehensive analysis of the global Y chromosomes both from 
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normal and abnormal genomes. This in turn is envisaged to help us undertake an 
accurate diagnosis of the individuals suffering from sex chromosome-related anom- 
alies. This, however, cannot be achieved unless pathways of signal transduction and 
apoptosis leading to control and regulation of spermatogenesis are deciphered. 
Finally, simultaneous analysis of Y-linked genes, loci, and mRNA transcripts both 
from somatic tissue (blood) and germline (spermatozoa) would enable segregating 
those genes that are particularly active in the germline. Work on this line would form 
a rich and reliable basis of germline comparative genomics across the species. 

A number of Y-based markers are currently being used extensively for pater- 
nity testing, particularly to ascertain the paternity of the male child. However, a 
number of still unexploited markers/marker systems of the Y chromosome are 
there that can be used both for paternity testing and DNA diagnosis. For 
instance, DYZ1 satellite in a single individual contains a total of 229 five-base- 
long repeat motifs “TTCCA” (Nakahori et al. 1986). This number is different in 
different individuals offering a rich source of innate sequence polymorphism to 
be exploited in the context of paternity. Likewise, a single array is about 3.4 kb, 
and since within the array there are regions that are highly variable, these 
regions can prove to be equally useful for unequivocal paternity testing based 
on PCR amplification, cloning, and sequencing. Systematic work along this line 
would have far reaching implications particularly, when a large number of 
males from across the different ethnic groups are screened. Even though com- 
prehensive analyses have been performed on Y chromosome, there still lies a 
vacuum as far as its fine genetic prints and working reprints are concerned. We 
therefore propose screening of additional Y chromosomes for a given abnor- 
mality from across the globe. That would uncover innate, novel, and singular 
polymorphisms as well as the ones that can be correlated with abnormal pheno- 
type. Expression studies of Y-linked genes from across the globe using samples 
representing normal and abnormal genomes may prove to be highly informative 
to pinpoint cause and effect of a disease on gene organization, regulation, and 
expression. Thus, there exists a scope to undertake analysis of the human Y 
chromosome for its large-scale biological applications and implications. 
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8.1 Introduction 


The way to a man’s heart is through his stomach. 
John Adams, 1814 


The state of our gut not only governs the health of our body but also our mental 
health and emotional status. Although, this notion is more than a century old and 
sporadic interest was visible from the 1970s, serious research studies on gut micro- 
biome and its implications on our health have just begun (Schmidt 2015). The 
human body consists of about 40 trillion cells (Bianconi et al. 2013) with about 
22,000 human genes in each cell (Pertea and Salzberg 2010). However, with the 
association of microbes immediately after birth, the human body contains about 
100 trillion cells and more than 2 million genes. The microbiota that gets associated 
with the human body makes up about 1—3% of the human body mass amounting to 
2-6 pounds of microorganisms in a 200-pound adult (Turnbaugh et al. 2007, HMP 
2007-2012). The additional cells as mentioned above are the microorganisms that, 
apart from the gut, also reside on the skin surface, in the deep skin layers, in the 
mouth, digestive tract and other human organ systems. The sum total of microor- 
ganisms that colonize the human body are collectively referred to as ‘human micro- 
biome or human microbiota’. The microbiome is central to human biology (Schnorr 
2015). With so much of microbiota getting associated with the human body, it 
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became imperative to understand the role of microbes that colonize the human body 
to fully understand and appreciate the human physiology and behaviour under 
healthy and diseased conditions. With the progress of research in this field, it is 
proposed that better understanding of human microbiome would pave the way for 
successful treatment of not only lifestyle diseases but also life-threatening diseases 
as well as non-genetic behavioural disorders. With the completion of phase 1| of the 
Human Microbiome Project (HMP 2008-2012) and the researches that have been 
carried out subsequently, it has become clear that the human microbiome is associ- 
ated with obesity, cancer, mental health disorders, asthma and autism. While many 
other aspects of these associations are yet to be investigated, we are not clear 
whether the differential microbiome composition among the diseased individuals is 
a consequence of the disease itself or the differing microbiome causes the disease. 





8.2. _ Inception of Microbiota with Human Body 


It will be pertinent to mention here that the human foetus grows in an absolutely 
sterile environment of the uterus for about 266 days from the time of conception till 
parturition. The first encounter of human with the microbes is during the passage of 
the infant through the birth canal, specifically the vaginal tract and the vulva (Ravel 
et al. 2011). Perhaps skin surface microbes are the first colonizers followed by nasal 
(respiratory) and oral (digestive) tracts, brought about by the processes of breathing 
and external feeding. Establishment of an unwavering flora on the skin, oral cavity 
and intestinal tract occurs with handling and feeding of the foetus within the first 
48 h. Mode of birth whether normal or caesarean also suggested to influence the 
microbial colonization of the human infant to a greater extent (Mackie et al. 1999; 
Dominguez-Bello et al. 2010). 





8.3. Diversity of Human Microbiota 


Recent investigations revealed that the human microflora is exceedingly intricate 
and includes more than 200 species of bacteria (Todar 2012). Various factors like 
genetics, age, sex, stress, nutrition and dietary habit of the individuals greatly influ- 
ence the diversity and abundance of microflora. The estimated number of bacteria 
present on the human skin, inside the mouth and the gastrointestinal tract, is 10", 
10'° and 104, respectively (Mikelsaar and Zilmer 2009). The number of bacteria in 
the human gut alone far exceeds the total number of human cells (Gerritsen et al. 
2011). The digestive system alone accounts for 55% of the total human microbiota, 
followed by skin, respiratory system and urogenital system. Surprisingly, blood 
contains just about 1% of the total human microbiota, while the conjunctiva has 
negligible quantity of microbiota (Table 8.1). The microbiota of the human intestine 
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Table 8.1 Microbiota (prevalent genera) that colonize different human organ systems 





No. of bacteria (as a % 
of the total microbiota 
S. no. | Body niche in humans) Prevalent genus 


fl Skin 21 Staphylococcus, Propionibacterium and 
Corynebacterium 


2 Gut 29 Bacteroides, Clostridium, Fusobacterium, 
Enterococcus, Eubacterium, Ruminococcus, 
Peptococcus, Peptostreptococcus, 
Bifidobacterium, Escherichia and 


























Lactobacillus 
3 Oral cavity 26 Streptococci, Lactobacilli, Staphylococci, 
Corynebacterium and Bacteroides 
4 Vagina 9 Lactobacillus, Atopobium, 
(urinogenital) Peptostreptococcus and Staphylococcus 
5 Conjunctiva 0 Staphylococcus, Propionibacterium and 
Haemophilus 
6 Respiratory 14 Prevotella, Sphingomonas, Pseudomonas, 
region Acinetobacter, Fusobacterium, Megasphaera, 
Veillonella, Staphylococcus and 
Streptococcus 
a Blood 1 Staphylococcus 





Instability of human microbiome (Adopted from Peterson et al. 2009 (NIH Human Microbiome 
Project)) 


is suggested to not only help in digestion, produce vitamins and promote gastroin- 
testinal motility but balance the immune system as well (Berg 1996), suggesting the 
larger implications on human health and diseases. The disturbance of microbiota— 
host relationship is associated with numerous chronic inflammatory diseases and 
metabolic syndrome (Chassaing et al. 2015). 


8.4 Human Microbiome and Human Health 


In order to better understand the impact of the human microbiome on human 
health and diseases, it is important to understand not only the microbial density/ 
load but also to know the diversity of microbes colonizing different organ sys- 
tems. Among the different organ systems that were assessed for the microbial 
diversity, the gut was found to have the highest diversity followed by the mouth 
and skin. Vaginal region had the least microbial diversity (Li et al. 2012). The 
highly diverse microflora of the digestive system are perhaps due to variable 
food habits of individuals, while the diversity of the skin microbiota might be 
related to the geographical differences (Kau et al. 2011). Shannon—Wiener index 
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(H’), Simpson’s index (D), Brillouin index (HB), richness and evenness are 
some of the standard diversity measures adopted in elucidating community 
diversity (Zar 2010). Each of these diversity indices has their own limitations 
and advantages and is often used in combination for a better understanding. 
Even the use of these indices in combinations falls short of expectations when 
one has to understand the microbial communities across human body habitats, 
specifically the failure to capture low abundant taxa. Tail statistic, t—a rank- 
based diversity measure that is similar to standard deviation statistic, so—is sug- 
gested to best suit the 16S profiles that tend to exhibit a long-tailed distribution 
(Li et al. 2012). 

The microbial community colonizing a healthy human body is dominated by 
four major phyla, viz. Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria. 
At the genus level, the most predominant genera are Bacteroides, Bifidobacterium, 
Clostridium, Eubacterium, Fusobacterium, Peptococcus, Peptostreptococcus and 
Ruminococcus. Bacteroides alone constitutes about 30% of the total gut microflora 
(Sears 2005). Escherichia and Lactobacillus are the other two genera present to a 
lesser extent (Khanna and Tosh 2014). Apart from Bacteria, Archaea and Fungi are 
the other group of microorganisms that are found in variable numbers in the human 
body. The common fungi include species of the genera Candida, Saccharomyces, 
Aspergillus and Penicillium (Hoffmann et al. 2013). 

In addition to the characteristic and systematic differences in the microbial 
diversity in different human body habitats, differences among individuals were 
also reported (Li et al. 2012). It is now becoming clear that the microbial com- 
munity differences among individuals hold a key to human health, diseases and 
treatment. Introduction (through dietary change) and/or extinction (due to antibi- 
otic treatment) of particular microbial groups would alter the community and 
population structure of the microbiota that potentially bring about functional 
variation. 

The developments of new sequencing technologies, computational algo- 
rithms and bioinformatic tools have made the exploration of the human micro- 
biome a frontier enterprise. The main focus in the recent past has been to 
elucidate the ‘core’ microbiome occupying specific human body niches and to 
ascertain interindividual differences of healthy humans. However, it is critically 
important to discern the differences between healthy and diseased individuals. 
Although association of specific microbial communities under physiologically 
different conditions of healthy subjects was being sporadically reported from 
the late 1970s, there is a steady stream of publications reporting the microbial 
communities predominant in human subjects affected by different diseases. In 
this chapter we have restricted our discussion to the diseases manifested as a 
consequence of altered gut microbial community structure (Fig. 8.1) due to 
medication. The specific diseases are Clostridium difficile infection (CDI), 
autism spectrum disorder (ASD), diabetes, gastric cancer, obesity and inflam- 
matory bowel disease (IBD). 
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Fig. 8.1 Complex interplay between gut microbes, diseases and their symptoms. Colour of each 
microbial-associated disease corresponds to its coloured phenotype. The dotted line between 
the causative microbial lineages represents the cross talk through nonlinear signalling 
interdependence 


8.5 Microbiome and Clostridium difficile Infection (CDI) 


Clostridium difficile is a pathogenic native gut microorganism, found in 3 out of 
100 adults and 7 out of 10 babies. In healthy individuals the population numbers of 
C. difficile are maintained at negligible level that is insufficient to cause disease. 
However, with administration of broad-spectrum antibiotics, the patients develop 
gastrointestinal illness, due to a toxin produced by C. difficile (Buss et al. 2015). 
The disease is referred to as C. difficile infection (CDI). Although our knowledge 
of CDI pathogenesis is still rudimentary (Britton and Young 2012), CDI is one of 
the most ubiquitous and expensive nosocomial infections. CDI occurs in 25% of all 
antibiotic-associated diarrhoea (Bartlett 2002). Another disease also called the 
antibiotic-associated diarrhoea is reported to coincide with the decline in the car- 
bohydrate-fermenting butyrate-producing members of the phylum Firmicutes 
(Britton and Young 2012). Further, it has been shown that even short-term antibi- 
otic treatment can bring about long-term changes in gut microbiota that is not 
necessarily reversible with the discontinuation of antibiotic treatment (Jakobsson 
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et al. 2010). The reduced microbial diversity would not only lead to invasion and 
proliferation of pathogenic flora due to lowered resistance (Chang et al. 2008; 
Britton and Young 2012) but is responsible for the progression of the disease 
(Freter 1955). 

Over the past decade, increased morbidity and mortality, as well as relapse of 
C. difficile infection, have become more common (Khanna et al. 2012) due to the 
emergence of strain 027 of C. difficile (Karas et al. 2010; Marsh et al. 2012). 
Antibiotic resistance, sporulation ability and toxin production are suggested to be 
the potential contributors to virulence of historical ribotypes and C. difficile 027 
(Warny et al. 2005; Drudy et al. 2007; Merrigan et al. 2010; Lanis et al. 2010, 2012, 
2013). TcdA and TcdB are two large clostridial toxins produced by C. difficile 
responsible for major virulence causing extensive tissue damage in human disease 
(Taylor et al. 1981; Libby et al. 1982; Lyerly et al. 1986). Among the two toxins, 
TcdB is the critical virulence factor (Lyras et al. 2009), antigenically variable and 
more lethal and causes more extensive brain haemorrhage (Lanis et al. 2013). 

Due to ever-increasing severity of CDI, many studies have been initiated to 
unravel the details of the disease progression that perhaps would aid in designing 
effective treatment. In humans, bile acids are secreted in the small intestine in 
response to consumption of food so as to facilitate absorption of fats and fat-soluble 
vitamins and nutrients (Britton and Young 2012). Cholate and chenodeoxycholate 
are the primary bile acids that are conjugated to either of the two amino acids— 
glycine and taurine (Ridlon et al. 2006). Deoxycholate, a secondary bile acid pro- 
duced by the action of 7-dehydroxylase on cholate, was reported to be a potent 
C. difficile spore germinant but highly toxic to its vegetative cells. Further, bile acid 
(taurocholate) and amino acid (glycine) were shown to enhance C. difficile spore 
germination by 1000-fold (Sorg and Sonenshein 2008). Antibiotic treatment per- 
haps reduced members of microbiota that were involved in the conversion of cholate 
to deoxycholate, thus resulting in increased levels of cholates and their derivatives. 
This in turn facilitates the germination of spores and growth and propagation of 
vegetative cells of C. difficile (Britton and Young 2012). However, chenodeoxycho- 
late is shown to inhibit spore germination (Sorg and Sonenshein 2008), and hence 
non-metabolizable derivates of chenodeoxycholate could serve as therapeutics 
(Sorg and Sonenshein 2010). Competitive exclusion of toxigenic C. difficile by non- 
toxigenic C. difficile (Sambol et al. 2002), direct antagonism by intestinal microbi- 
ota such as Bacillus thuringiensis that secretes thuricin CD bacteriocin with 
narrow-spectrum activity against C. difficile spores (Rea et al. 2010) and faecal 
transplantation (Khoruts et al. 2010) are suggested as potential curative measures. 





8.6 Autism Spectrum Disorders (ASD) and Gut Microbiome 


The gut microbes are now reported to make neuroactive compounds, including 
neurotransmitters and metabolites that act on brain via the vagus nerve that con- 
nects the brain and the digestive tract (Schmidt 2015). Disruptions of the healthy 
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microbiome are suggested to result in anxiety, depression and even autism. 
Autism spectrum disorders (ASD) are complex neurobiological disorders char- 
acterized by stereotyped behavioural patterns leading to visible impairment in 
social interactions and communications (Johnson and Myers 2007). Both genetic 
and environmental factors play an important role in ASD aetiology. Genetically, 
ASD is linked with autosomal recessive inheritance, X-linked inheritance and 
sporadic chromosomal anomalies. Among the environmental factors, gut 
microbes have the potential to interact with central nervous system (Collins and 
Bercik 2009). Autistic children’s gut had reduced bacterial richness compared to 
neurotypical children. Altered gut microbiota was not due to demographics or 
special diets but due to antibiotic treatment that is suggested to aggravate ASD- 
related behavioural symptoms (Kang et al. 2013). High levels of gram-negative 
bacteria Bacteroides vulgatus and Desulfovibrio have been reported in autistic 
children (Finegold et al. 2010). Lipopolysaccharides (LPS) present in the cell 
walls of many pathogenic gram-negative bacteria are suggested to damage many 
tissues including the brain (Minami et al. 2007) leading to increased permeabil- 
ity of the blood-brain barrier, thus facilitating the accumulation of high levels of 
mercury in the cerebrum that may aggravate ASD symptoms (Adams et al. 2008). 
Glutathione—an important antioxidant responsible for heavy metal detoxifica- 
tion in the brain—has been shown to be reduced in rats exposed to LPS (Zhu 
et al. 2007). Depletion of glutathione could also be caused by p-cresol—forma- 
tion of which is catalysed by a glycyl radical enzyme (p-hydroxyphenylacetate 
decarboxylase) from C. difficile, a gram-positive bacteria (Selmer and Andrei 
2001). As discussed above C. difficile is known to play a crucial role in develop- 
ment of gastrointestinal illness (GI). Thus the presence of autistic symptoms and 
their correlated GI severity seems to be linked to reduced richness and diversity 
of gut microflora that in turn might alter the physiological functionality and 
microbial GI robustness due to decrease in microbial redundancy in ASD chil- 
dren (Kang et al. 2013). Although a statistically significant correlation between 
autistic symptoms and abundances of unclassified Veillonellaceae, Prevotella 
and Coprococcus genera is established, severity of GI symptoms is not a signifi- 
cant predictor of these microbial changes among autistic children (Kang et al. 
2013). ASD children are reported to have a strong preference for starches, snack 
and processed foods while rejecting most fruits, vegetables and proteins (Field 
et al. 2003; Sharp et al. 2013). Although the aetiological factors contributing to 
feeding problems in ASD patients remain elusive (Mulle et al. 2013), neurobe- 
haviourally influenced aetiology of higher rates of constipation and encopresis is 
reported in ASD (Ibrahim et al. 2009). The major function of gut microbiome of 
healthy individuals is to help in breaking down complex plant polysaccharides 
and other dietary matter. The altered gut microbiome of the ASD patients 
reported is unable to assist in the breakdown of the plant polysaccharides thus 
causing GI distress (Mulle et al. 2013). Hence, interventions aimed at restoring 
the microbial balance in the gut of ASD individuals might improve behaviours 
(Mulle et al. 2013). 
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8.7. Gut Microbiome, Obesity and Diabetes 


The relation between obesity and gut microbiota was known as early as three 
decades ago, and the gut microbiota is shown to shift in response to host adiposity 
and nutrient intake (Musso et al. 2011). Several studies have suggested the involve- 
ment of gut microbiota in host metabolism, energy utilization and storage (Musso 
et al. 2011) leading to the development of fat mass and fat storage (Backhed et al. 
2004; Everard and Cani 2013). Bacteroides intestinalis, Bacteroides fragilis and 
Escherichia coli are suggested to be involved in generation of secondary bile acids 
in the colon (Fukiya et al. 2009), and bile acids are known to exert metabolic regula- 
tory functions in addition to favouring dietary lipid absorption (Keitel et al. 2008; 
Lefebvre et al. 2009). The development of obesity was found to be associated with 
the enrichment of Firmicutes—specifically Mollicutes—at the expense of 
Bacteroidetes in mice fed with high-fat/high-sugar diet compared to those fed with 
low-fat/high-polysaccharide diet (Turnbaugh et al. 2008). The microbiome of the 
obese mice showed enrichment in genes coding for enzymes that enable the extrac- 
tion of energy from otherwise indigestible alimentary polysaccharides suggesting 
increased energy extraction capacity of the gut flora of obese individuals (Turnbaugh 
et al. 2006; Musso et al. 2011). Further, gut microbiota is shown to play a major role 
in the onset of insulin resistance and type 2 diabetes (Backhed et al. 2004, 2007; 
Cani et al. 2007a; Shen et al. 2013) triggering low-grade inflammation—a common 
feature characterizing obesity and several other metabolic disorders (Everard and 
Cani 2013). Microbiota-derived lipopolysaccharides (LPS) are reported to be the 
key molecule involved in early development of inflammation and metabolic dis- 
eases (Cani et al. 2007b). Animal model studies have established that obesity is 
transmissible along with gut microbiota (Musso et al. 2011) as transplantation of 
microbiota from obese mice to germ-free wild-type recipient mice resulted in 
increased adiposity compared to those that received microbiota from conventionally 
raised lean wild-type littermates (Turnbaugh et al. 2006). 

Diet also plays an important role in changing the microbial diversity of gut 
microbiome. High-fat diet when given to both obese and lean genotypes was found 
to be associated with a decrease in Bacteroidetes and an increase in both Firmicutes 
and Proteobacteria (Hildebrandt et al. 2009; Turnbaugh et al. 2009). On the other 
hand, germ-free mice were found to be resistant to diet-induced obesity caused by 
consumption of a high-fat or high-sugar ‘Western’ diet (Backhed et al. 2007). A 
study by Ley et al. (2005) clearly demonstrated that both—genetic obese and diet- 
induced obese—had increased abundance of Firmicutes in their gut microbiome. 

While type 2 diabetes is a metabolic disorder caused due to obesity-linked insulin 
resistance, type | diabetes (T1D) is a T-cell-mediated autoimmune disease due to 
slow and progressive destruction of insulin-producing f cells (Zipris 2008). Both 
genetic and environmental factors are known to contribute to autoimmunity disor- 
ders. Altered gut microbiota, impaired intestinal mucosal barrier and mucosal immu- 
nity are reported to contribute to T1D pathogenesis (Musso et al. 2011). Although 
specific details of how the gut microbiota regulates the T1D are unknown, T1D- 
resistant MyD88 KO mice were shown to harbour a lower Firmicutes/Bacteroidetes 
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ratio with an increased proportion of Lactobacilli, Rikenellae and Porphyromonadaeae 
(Wen et al. 2008). The dynamic link between gut microbiota, adiposity and diabetes 
indicates that manipulation of gut microbial communities by dietary interventions 
(e.g. probiotics or prebiotics) and translocation could be an approach to treat obesity 
and improve metabolic health (Flint et al. 2014). 





8.8 | Gut Microbiome and Inflammatory Bowel Disease (IBD) 


Inflammatory bowel disease (IBD) involves chronic and recurring immune responses 
with relapsing and remitting inflammations in gastrointestinal tract. Aetiology is 
multifarious including genetic, microbial and environmental factors contributing to 
disease development (Cho and Blaser 2012). IBD primarily includes two subtypes, 
namely, ulcerative colitis (UC) and Crohn’s disease (CD). UC remains confined to 
the colon and rectum, while CD can affect different areas of GI tract including the 
mouth. These are characterized as autoimmune diseases with the identification of 
pathways involving NOD2, ROS, CARD9 and Th17 cells in genetically susceptible 
hosts (Cho and Blaser 2012). Genetic predisposition is in itself not sufficient for the 
onset and progression of inflammation. Microbial dysbiosis plays a key role in the 
onset and progression of IBD, indicating the complex interplay between the gut 
microbiome and genetic susceptibility to IBD (Knights et al. 2013). 

Microbial dysbiosis refers to the shift in relative abundances of dominant taxa 
and decrease in overall diversity of gut community (Sokol and Seksik 2010). It 
remains unclear whether this dysbiosis is the cause of or the response to the disease; 
nevertheless stable and healthy gut commensal bacteria are necessary to suppress 
the pathogenic infection (Kamada et al. 2012). Broadly, IBD is associated with 
reduced gut diversity, an increase in proportion of Gammaproteobacteria and 
reduced number of Firmicutes (Sokol and Seksik 2010). A significant decrease in 
abundance of two genera Roseburia and Phascolarctobacterium is associated with 
both UC and CD subjects (Morgan et al. 2012). In gut, species of the genus 
Roseburia are associated with production of butyrate and utilization of acetate 
(Duncan et al. 2002), whereas species of the genus Phascolarctobacterium are asso- 
ciated with production of propionate in coculture with Paraprevotella (Watanabe 
et al. 2012). Therefore, apart from changes in composition, functional imbalance 
has also been witnessed in IBD subjects including upregulation of sulphur metabo- 
lism pathways and downregulation of butanoate and propanoate metabolism. Few 
microbial clades are differentially abundant in CD and UC patients; proportion of 
Faecalibacterium of Ruminococcaceae family (acetate producers) is reduced, and 
members of the family Enterobacteriaceae show significant increase in abundance 
in CD (Kang et al. 2010), whereas a significant reduction in members of 
Leuconostocaceae is seen in UC (Morgan et al. 2012). 

Epidemiological studies on concordance rates for IBD in German monozygotic 
twins (16% for UC and about 35% for CD) suggest stronger genetic influence in CD 
as compared to UC and also indicate the role of environmental factors in the devel- 
opment of chronic inflammation (Spehlmann et al. 2008). 
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Dietary intake is also correlated with incidence of IBD. Diet with high amounts 
of total fats, PUFAs, omega-6 fatty acids and meat was associated with an 
increased risk of CD and UC, whereas high fibre and fruit intake were related to 
decreased risk for CD. High vegetable intake was linked with decreased risk for 
UC (Hou et al. 2011). Recently, blow-out of “Western diet’, rich in protein but low 
in fruits and vegetable, is also being considered as a reason for increasing IBD 
incidence. 

Hence, there exists an interaction network between genetics, host gut microbi- 
ome and diet providing feedback to host immune responses. For instance weakened 
immune response to commensal bacteria in gut can result from mutations in NOD2 
and GPR35 and, as a result, cause imbalance in taxonomic structure of gut micro- 
biota which can subsequently lead to metabolic dysbiosis. Altered metabolic capa- 
bilities of gut microbiome may further lead to diminished antibacterial activity 
through different pathways and consequent taxonomic and metabolic imbalance 
(Knights et al. 2013). Recently, even the alterations in gut virome have been 
observed in IBD patients (Ray 2015). 

Based on studies done so far, treatments used for IBD are accompanied with 
potential risks and side effects. However, use of probiotics and prebiotics with clini- 
cal course is being tested for its cure of which using Faecalibacterium as a probiotic 
is a promising strategy in counterbalancing the gut commensal bacteria composition 
in CD patients (Sokol et al. 2008). Symbiosis factors from microbes can also be 
employed in therapeutics for inflammation, for example, PSA (polysaccharide A) 
produced by B. fragilis is reported to suppress the production of interleukin-17 (pro- 
inflammatory) from intestinal immune cells (Mazmanian et al. 2008). Apart from 
these, researchers are trying faecal bacteriotherapy (FBT) in which faeces from 
healthy donor are transplanted into the gut as a treatment of UC, though it has not 
yet approved regulatory authorities. 





8.9 Microbiome and Gastric Cancers 


It is evident from the preceding discussion that the gut microbiota has significant 
influence on inflammation of the gut particularly the distal large intestine (Louis 
et al. 2014). The chronic inflammation of the gastrointestinal tract progresses to 
inflammatory bowel disease (IBD), and IBD patients are reported to show an 
increased incidence of colorectal cancer (CRC) also known as colitis-associated 
cancer (CAC) (Jess et al. 2005; Danese et al. 2011). More than 95% of CRC cases 
show an association with dietary lifestyle and more recently gut microbiota, while 
less than 5% are hereditary (Rustgi 2007; Watson and Collins 2010; Irrazabal et al. 
2014). CRC is ranked third among the most common causes of cancer-related 
deaths in the world (AICR 2007; Jemal et al. 2011; Irrazabal et al. 2014). Several 
pathogenic bacteria have been implicated in promoting CRC via pro-inflammatory 
interactions with host cells (Sears and Garrett 2014; Zackular 2014; Zackular et al. 
2014; Louis et al. 2014). Relative abundance of Ruminococcaceae, Clostridium, 
Pseudomonas and Porphyromonadaceae was higher, while the relative abundances 
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of Bacteroides, Lachnospiraceae, Clostridiales and Clostridium were found to be 
less in patients with adenomas (Zackular et al. 2014). Further, patients with carcino- 
mas had higher relative abundances of Fusobacterium, Porphyromonas, 
Lachnospiraceae and Enterobacteriaceae and lower abundances of Bacteroides, 
Lachnospiraceae and Clostridiales (Zackular et al. 2014). Furthermore, Helicobacter 
pylorus has been identified as the primary cause of gastric cancer (Tu et al. 2008). 
However, it is now becoming increasingly clear that collective activities of the met- 
abolic products of the microbiota greatly influence the predisposition to and protec- 
tion against CRC (Gill and Rowland 2002; Schwabe and Jobin 2013). Nitrosation 
of amines produced by fermentation of proteins in the large intestine by Bacteroides 
and Firmicutes leads to formation of N-nitroso compounds that have the potential to 
promote cancer (Rowland 2000; Louis et al. 2014) as indicated by the positive cor- 
relation between dietary intake of NOCs and CRC in European populations (Loh 
et al. 2011). Nitroreductases and nitrate reductases encoded by Proteobacteria are 
suggested to be facilitating nitrosation (Louis et al. 2014). Ammonia—a product of 
protein fermentation—is reported to be potentially carcinogenic at low concentra- 
tions (Windey et al. 2012). Although polyamines are essential for maintenance of 
structural integrity of membranes and nucleic acids, higher levels of polyamines are 
associated with several diseases including cancer (Louis et al. 2014) and certain gut 
bacteria including enterotoxigenic B. fragilis that upregulate polyamine production 
(Pegg 2013). Further, pathogens such as Shigella flexneri, Streptococcus pneu- 
moniae, Salmonella enterica and H. pylori are known to exploit polyamines to 
increase their virulence (Di Martino et al. 2013). Colonocyte barrier breakdown by 
toxic sulphide produced as hydrogen sulphide in the gut by sulphate-reducing bac- 
teria related to Disulfovibrio spp. could be another causative agent of CRC as indi- 
cated by higher stool sulphide levels in CRC patients, although increased levels of 
Disulfovibrio spp. have not been reported (Carbonero et al. 2012). However, several 
bacterial pathogens such as B. fragilis, E. coli NC101 strain, Fusobacterium spp. 
and Campylobacter spp. seem to be directly and specifically involved in promoting 
CRC (Sears 2009; Arthur et al. 2012; Kostic et al. 2013). Further, there is a complex 
interplay between diet, bile acid and gut microbiota (Louis et al. 2014). Higher-fat 
intake is positively correlated with secondary bile acids (Ou et al. 2013), and sec- 
ondary bile acid deoxycholic acid is reported to promote liver cancer (Yoshimoto 
et al. 2013). Furthermore, higher levels of bile acids are reported from faecal sam- 
ples of CRC patients (Ou et al. 2012). 

Both animal and human studies suggest that dietary supplementation with non- 
digestible carbohydrates can reduce protein fermentation in the large intestine, lead- 
ing to decrease in the genotoxicity of faecal water (Windey et al. 2012), thus 
reducing the incidence of IBD as well as CRC. 





Conclusions 
We owe our very persistence in nature to the plethora of the microbiota that has 
colonized our various organ systems. Particularly, the gut microbiota provides 
important benefits in terms of primary breakdown of the food ingested, immune 
development as well as mental wellbeing. However, the full import of the role of 
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human microbiome on the health as well as disease has just begun to emerge with 
the advent of culture independent research technologies. Although specific 
microbes have been implicated in causing and/or promoting specific diseases, it 
is now becoming clear that it is the overall community structure of microbiota that 
is the ‘Lakshman Rekha’ that separates health and disease, and diet seems to play 
a very crucial role in altering the community structure of the gut microbiota. The 
way to a man’s heart is certainly through his stomach but via the microbiota. 
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